Model Gallery

67 models from 1 repositories

Filter by type:

Filter by tags:

qwen3.6-27b-heretic-uncensored-finetune-neo-code-di-imatrix-max
Qwen3.6-27B-Heretic2-Uncensored-Finetune-Thinking Yes... fully uncensored AND fine tuned lightly. Freedom and brainpower. Trained on different Heretic base, with different KLD/Refusals. Model fine tune was used to finalize and "firm up" Heretic / uncensored changes. The goal here was light, minor fixes rather than full / heavy fine tune. That being said, the tuning still raised critical metrics. This is Version 2, using "trohrbaugh" Heretic, which has a lower refusal rate, and tuning bumped up the metrics a bit more too. This has also positively impacted "NEO-Coder Di-Matrix" (dual imatrix) GGUF quants as well (vs heretic/non heretic too). https://huggingface.co/DavidAU/Qwen3.6-27B-Heretic-Uncensored-FINETUNE-NEO-CODE-Di-IMatrix-MAX-GGUF ``` IN HOUSE BENCHMARKS [by Nightmedia]: arc-c arc/e boolq hswag obkqa piqa wino Qwen3.6-27B-Heretic2-Uncensored-Finetune-Thinking mxfp8 0.673,0.846,0.905... [instruct mode] Qwen3.6-27B-Heretic-Uncensored-Finetune-Thinking mxfp8 0.669,0.835,0.906,... [instruct mode] BASE UNTUNED MODEL: Qwen3.6-27B HERETIC (by llmfan46) [instruct mode] mxfp8 0.644,0.788,0.902,... ...

Repository: localaiLicense: apache-2.0

carnice-v2-27b
# Carnice-V2-27B for Hermes Agent Carnice-V2-27B is a full merged BF16 SFT of `Qwen/Qwen3.6-27B` for Hermes-style agent traces. This repository contains the standalone merged model weights, not only a LoRA adapter. ## BF16 Transformers Loading Fix The BF16 safetensors were republished with corrected `Qwen3_5ForConditionalGeneration` tensor prefixes. The original merge artifact accidentally serialized an extra Unsloth wrapper prefix, which caused direct HF Transformers loads to report the real weights as unexpected keys and initialize expected layers randomly. GGUF files were not affected because the GGUF conversion path normalized those prefixes. ## Benchmarks The benchmark artifact bundle is included under `benchmarks/`. It contains the rendered graph, extracted `metrics.json`, benchmark scripts, and raw result files used to make the chart. Scope note: the IFEval run is a short `limit=20` A/B smoke benchmark, not an official full leaderboard score. Held-out loss/perplexity is the exact assistant-only training-format validation metric from the SFT script. The raw BFCL two-case smoke files are included for auditability, but they are too small to use as a model-quality claim. ...

Repository: localaiLicense: apache-2.0

qwopus3.6-27b-v1-preview
# Qwen3.6-27B [](https://chat.qwen.ai) > [!Note] > This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. > > These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience. ## Qwen3.6 Highlights This release delivers substantial upgrades, particularly in - **Agentic Coding:** the model now handles frontend workflows and repository-level reasoning with greater fluency and precision. - **Thinking Preservation:** we've introduced a new option to retain reasoning context from historical messages, streamlining iterative development and reducing overhead. For more details, please refer to our blog post Qwen3.6-27B. ## Model Overview ...

Repository: localaiLicense: apache-2.0

qwen3.6-27b
# Qwen3.6-27B [](https://chat.qwen.ai) > [!Note] > This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. > > These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience. ## Qwen3.6 Highlights This release delivers substantial upgrades, particularly in - **Agentic Coding:** the model now handles frontend workflows and repository-level reasoning with greater fluency and precision. - **Thinking Preservation:** we've introduced a new option to retain reasoning context from historical messages, streamlining iterative development and reducing overhead. For more details, please refer to our blog post Qwen3.6-27B. ## Model Overview ...

Repository: localaiLicense: apache-2.0

qwen3.5-27b-claude-4.6-opus-reasoning-distilled-heretic-i1

Repository: localaiLicense: apache-2.0

qwen3.5-27b-claude-4.6-opus-reasoning-distilled-i1
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-i1-GGUF - A GGUF quantized model optimized for local inference. Specialized for reasoning and chain-of-thought tasks. Based on Qwen 3.5 architecture with enhanced language understanding. Available in multiple quantization levels for various hardware requirements. Distilled from Claude-style reasoning models for enhanced logical reasoning capabilities.

Repository: localaiLicense: apache-2.0

q3.5-bluestar-27b

Repository: localaiLicense: mit

qwen3.5-397b-a17b

Repository: localaiLicense: apache-2.0

qwen3.5-27b

Repository: localaiLicense: apache-2.0

vibevoice-cpp-asr
VibeVoice ASR 7B (C++ / GGML, Q4_K) - long-form speech-to-text with speaker diarization. Returns per-speaker JSON segments with start/end timestamps. English-only. ~10 GB download.

Repository: localaiLicense: mit

qwen3-tts-1.7b-custom-voice
Qwen3-TTS is a high-quality text-to-speech model supporting custom voice, voice design, and voice cloning.

Repository: localaiLicense: apache-2.0

qwen3-asr-1.7b
Qwen3-ASR is an automatic speech recognition model supporting multiple languages and batch inference.

Repository: localaiLicense: apache-2.0

mox-small-1-i1
The model, **vanta-research/mox-small-1**, is a small-scale text-generation model optimized for conversational AI tasks. It supports chat, persona research, and chatbot applications. The quantized versions (e.g., i1-Q4_K_M, i1-Q4_K_S) are available for efficient deployment, with the i1-Q4_K_S variant offering the best balance of size, speed, and quality. The model is designed for lightweight inference and is compatible with frameworks like HuggingFace Transformers.

Repository: localaiLicense: apache-2.0

qwen3-asr-1.7b
Qwen3-ASR is an automatic speech recognition model supporting multiple languages and batch inference.

Repository: localaiLicense: apache-2.0

ibm-granite_granite-4.0-h-tiny
Granite-4.0-H-Tiny is a 7B parameter long-context instruct model finetuned from Granite-4.0-H-Tiny-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Repository: localaiLicense: apache-2.0

aurore-reveil_koto-small-7b-it
Koto-Small-7B-IT is an instruct-tuned version of Koto-Small-7B-PT, which was trained on MiMo-7B-Base for almost a billion tokens of creative-writing data. This model is meant for roleplaying and instruct usecases.

Repository: localaiLicense: mit

dream-org_dream-v0-instruct-7b
This is the instruct model of Dream 7B, which is an open diffusion large language model with top-tier performance.

Repository: localaiLicense: apache-2.0

smolvlm-instruct
SmolVLM is a compact open multimodal model that accepts arbitrary sequences of image and text inputs to produce text outputs. Designed for efficiency, SmolVLM can answer questions about images, describe visual content, create stories grounded on multiple images, or function as a pure language model without visual inputs. Its lightweight architecture makes it suitable for on-device applications while maintaining strong performance on multimodal tasks.

Repository: localaiLicense: apache-2.0

qwen3-1.7b
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-1.7B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 1.7B Number of Paramaters (Non-Embedding): 1.4B Number of Layers: 28 Number of Attention Heads (GQA): 16 for Q and 8 for KV Context Length: 32,768

Repository: localaiLicense: apache-2.0

menlo_lucy
Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing. Built on Qwen3-1.7B, Lucy inherits deep research capabilities from larger models while being optimized to run efficiently on mobile devices, even with CPU-only configurations. We achieved this through machine-generated task vectors that optimize thinking processes, smooth reward functions across multiple categories, and pure reinforcement learning without any supervised fine-tuning.

Repository: localaiLicense: apache-2.0

menlo_lucy-128k
Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing. Built on Qwen3-1.7B, Lucy inherits deep research capabilities from larger models while being optimized to run efficiently on mobile devices, even with CPU-only configurations. We achieved this through machine-generated task vectors that optimize thinking processes, smooth reward functions across multiple categories, and pure reinforcement learning without any supervised fine-tuning.

Repository: localaiLicense: apache-2.0

Page 1