Model Gallery

36 models from 1 repositories

Filter by type:

Filter by tags:

qwopus3.6-35b-a3b-v1
# Qwen3.6-35B-A3B [](https://chat.qwen.ai) > [!Note] > This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. > > These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience. ## Qwen3.6 Highlights This release delivers substantial upgrades, particularly in - **Agentic Coding:** the model now handles frontend workflows and repository-level reasoning with greater fluency and precision. - **Thinking Preservation:** we've introduced a new option to retain reasoning context from historical messages, streamlining iterative development and reducing overhead. For more details, please refer to our blog post Qwen3.6-35B-A3B. ## Model Overview ...

Repository: localaiLicense: apache-2.0

qwen3.6-35b-a3b-claude-4.6-opus-reasoning-distilled
# 🔥 Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled A reasoning SFT fine-tune of `Qwen/Qwen3.6-35B-A3B` on chain-of-thought (CoT) distillation mostly sourced from Claude Opus 4.6. The goal is to preserve Qwen3.6's strong agentic coding and reasoning base while nudging the model toward structured Claude Opus-style reasoning traces and more stable long-form problem solving. The training path is text-only. The Qwen3.6 base architecture includes a vision encoder, but this fine-tuning run did not train on image or video examples. - **Developed by:** @hesamation - **Base model:** `Qwen/Qwen3.6-35B-A3B` - **License:** apache-2.0 This fine-tuning run is inspired by Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, including the notebook/training workflow style and Claude Opus reasoning-distillation direction. [](https://x.com/Hesamation) [](https://discord.gg/vtJykN3t) ## Benchmark Results The MMLU-Pro pass used 70 total questions per model: `--limit 5` across 14 MMLU-Pro subjects. Treat this as a smoke/comparative check, not a release-quality full benchmark. ...

Repository: localaiLicense: apache-2.0

qwen3.6-35b-a3b-apex
# Qwen3.6-35B-A3B [](https://chat.qwen.ai) > [!Note] > This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. > > These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience. ## Qwen3.6 Highlights This release delivers substantial upgrades, particularly in - **Agentic Coding:** the model now handles frontend workflows and repository-level reasoning with greater fluency and precision. - **Thinking Preservation:** we've introduced a new option to retain reasoning context from historical messages, streamlining iterative development and reducing overhead. For more details, please refer to our blog post Qwen3.6-35B-A3B. ## Model Overview ...

Repository: localaiLicense: apache-2.0

qwen3.6-35b-a3b
# Qwen3.6-35B-A3B [](https://chat.qwen.ai) > [!Note] > This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. > > These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience. ## Qwen3.6 Highlights This release delivers substantial upgrades, particularly in - **Agentic Coding:** the model now handles frontend workflows and repository-level reasoning with greater fluency and precision. - **Thinking Preservation:** we've introduced a new option to retain reasoning context from historical messages, streamlining iterative development and reducing overhead. For more details, please refer to our blog post Qwen3.6-35B-A3B. ## Model Overview ...

Repository: localaiLicense: apache-2.0

qwen3.5-35b-a3b-apex
Describe the model in a clear and concise way that can be shared in a model gallery.

Repository: localaiLicense: apache-2.0

qwen_qwen3.5-35b-a3b
Qwen3.5-35B-A3B is a quantized multimodal language model with 35B parameters using an A3B MoE architecture. It supports image-text understanding and chat interactions via llama-cpp backend.

Repository: localaiLicense: apache-2.0

vibevoice-cpp
VibeVoice Realtime 0.5B (C++ / GGML, Q8_0) - native C++ port of Microsoft VibeVoice via the vibevoice-cpp backend. 24kHz mono TTS with voice cloning from a single reference voice prompt. Default voice prompt: en-Carter_man.

Repository: localaiLicense: mit

arcee-ai_afm-4.5b
AFM-4.5B is a 4.5 billion parameter instruction-tuned model developed by Arcee.ai, designed for enterprise-grade performance across diverse deployment environments from cloud to edge. The base model was trained on a dataset of 8 trillion tokens, comprising 6.5 trillion tokens of general pretraining data followed by 1.5 trillion tokens of midtraining data with enhanced focus on mathematical reasoning and code generation. Following pretraining, the model underwent supervised fine-tuning on high-quality instruction datasets. The instruction-tuned model was further refined through reinforcement learning on verifiable rewards as well as for human preference. We use a modified version of TorchTitan for pretraining, Axolotl for supervised fine-tuning, and a modified version of Verifiers for reinforcement learning. The development of AFM-4.5B prioritized data quality as a fundamental requirement for achieving robust model performance. We collaborated with DatologyAI, a company specializing in large-scale data curation. DatologyAI's curation pipeline integrates a suite of proprietary algorithms—model-based quality filtering, embedding-based curation, target distribution-matching, source mixing, and synthetic data. Their expertise enabled the creation of a curated dataset tailored to support strong real-world performance. The model architecture follows a standard transformer decoder-only design based on Vaswani et al., incorporating several key modifications for enhanced performance and efficiency. Notable architectural features include grouped query attention for improved inference efficiency and ReLU^2 activation functions instead of SwiGLU to enable sparsification while maintaining or exceeding performance benchmarks. The model available in this repo is the instruct model following supervised fine-tuning and reinforcement learning.

Repository: localaiLicense: apache-2.0

qwen3-235b-a22b-instruct-2507
We introduce the updated version of the Qwen3-235B-A22B non-thinking mode, named Qwen3-235B-A22B-Instruct-2507, featuring the following key enhancements: Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage. Substantial gains in long-tail knowledge coverage across multiple languages. Markedly better alignment with user preferences in subjective and open-ended tasks, enabling more helpful responses and higher-quality text generation. Enhanced capabilities in 256K long-context understanding.

Repository: localaiLicense: apache-2.0

qwen3-coder-480b-a35b-instruct
Today, we're announcing Qwen3-Coder, our most agentic code model to date. Qwen3-Coder is available in multiple sizes, but we're excited to introduce its most powerful variant first: Qwen3-Coder-480B-A35B-Instruct. featuring the following key enhancements: Significant Performance among open models on Agentic Coding, Agentic Browser-Use, and other foundational coding tasks, achieving results comparable to Claude Sonnet. Long-context Capabilities with native support for 256K tokens, extendable up to 1M tokens using Yarn, optimized for repository-scale understanding. Agentic Coding supporting for most platform such as Qwen Code, CLINE, featuring a specially designed function call format.

Repository: localaiLicense: apache-2.0

qwen3-30b-a1.5b-high-speed
This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly. This is a simple "finetune" of the Qwen's "Qwen 30B-A3B" (MOE) model, setting the experts in use from 8 to 4 (out of 128 experts). This method close to doubles the speed of the model and uses 1.5B (of 30B) parameters instead of 3B (of 30B) parameters. Depending on the application you may want to use the regular model ("30B-A3B"), and use this model for simpler use case(s) although I did not notice any loss of function during routine (but not extensive) testing. Example generation (Q4KS, CPU) at the bottom of this page using 4 experts / this model. More complex use cases may benefit from using the normal version. For reference: Cpu only operation Q4KS (windows 11) jumps from 12 t/s to 23 t/s. GPU performance IQ3S jumps from 75 t/s to over 125 t/s. (low to mid level card) Context size: 32K + 8K for output (40k total)

Repository: localaiLicense: apache-2.0

qwen3-55b-a3b-total-recall-v1.3-i1
WARNING: MADNESS - UN HINGED and... NSFW. Vivid prose. INTENSE. Visceral Details. Violence. HORROR. GORE. Swearing. UNCENSORED... humor, romance, fun. This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly. This model is for all use cases, but excels in creative use cases specifically. This model is based on Qwen3-30B-A3B (MOE, 128 experts, 8 activated), with Brainstorm 40X (by DavidAU - details at bottom of this page. This is the refined version -V1.3- from this project (see this repo for all settings, details, system prompts, example generations etc etc): https://huggingface.co/DavidAU/Qwen3-55B-A3B-TOTAL-RECALL-Deep-40X-GGUF/ This version -1.3- is slightly smaller, with further refinements to the Brainstorm adapter. This will change generation and reasoning performance within the model.

Repository: localaiLicense: apache-2.0

qwen3-55b-a3b-total-recall-deep-40x
WARNING: MADNESS - UN HINGED and... NSFW. Vivid prose. INTENSE. Visceral Details. Violence. HORROR. GORE. Swearing. UNCENSORED... humor, romance, fun. Qwen3-55B-A3B-TOTAL-RECALL-Deep-40X-GGUF A highly experimental model ("tamer" versions below) based on Qwen3-30B-A3B (MOE, 128 experts, 8 activated), with Brainstorm 40X (by DavidAU - details at bottom of this page). These modifications blow the model (V1) out to 87 layers, 1046 tensors and 55B parameters. Note that some versions are smaller than this, with fewer layers/tensors and smaller parameter counts. The adapter extensively alters performance, reasoning and output generation. Exceptional changes in creative, prose and general performance. Regens of the same prompt - even with the same settings - will be very different. THREE example generations below - creative (generated with Q3_K_M, V1 model). ONE example generation (#4) - non creative (generated with Q3_K_M, V1 model). You can run this model on CPU and/or GPU due to unique model construction, size of experts and total activated experts at 3B parameters (8 experts), which translates into roughly almost 6B parameters in this version. Two quants uploaded for testing: Q3_K_M, Q4_K_M V3, V4 and V5 are also available in these two quants. V2 and V6 in Q3_k_m only; as are: V 1.3, 1.4, 1.5, 1.7 and V7 (newest) NOTE: V2 and up are from source model 2, V1 and 1.3,1.4,1.5,1.7 are from source model 1.

Repository: localaiLicense: apache-2.0

opencoder-1.5b-base
The model is a large language model with 1.5 billion parameters, trained on 2.5 trillion tokens of code-related data. It supports both English and Chinese languages and is part of the OpenCoder LLM family which also includes 8B base and chat models. The model achieves high performance across multiple language model benchmarks and is one of the most comprehensively open-sourced models available.

Repository: localaiLicense: inf

opencoder-1.5b-instruct
The model is a quantized version of [infly/OpenCoder-1.5B-Instruct](https://huggingface.co/infly/OpenCoder-1.5B-Instruct) created using llama.cpp. The original model, infly/OpenCoder-1.5B-Instruct, is an open and reproducible code LLM family which includes 1.5B and 8B base and chat models, supporting both English and Chinese languages. The model is pretrained on 2.5 trillion tokens composed of 90% raw code and 10% code-related web data, and supervised finetuned on over 4.5M high-quality SFT examples. It achieves high performance across multiple language model benchmarks, positioning it among the leading open-source models for code.

Repository: localaiLicense: inf

llama3.2-3b-enigma
Enigma is a code-instruct model built on Llama 3.2 3b. It is a high quality code instruct model with the Llama 3.2 Instruct chat format. The model is finetuned on synthetic code-instruct data generated with Llama 3.1 405b and supplemented with generalist synthetic data. It uses the Llama 3.2 Instruct prompt format.

Repository: localaiLicense: llama3.2

llama3.2-3b-esper2
Esper 2 is a DevOps and cloud architecture code specialist built on Llama 3.2 3b. It is an AI assistant focused on AWS, Azure, GCP, Terraform, Dockerfiles, pipelines, shell scripts and more, with real world problem solving and high quality code instruct performance within the Llama 3.2 Instruct chat format. Finetuned on synthetic DevOps-instruct and code-instruct data generated with Llama 3.1 405b and supplemented with generalist chat data.

Repository: localaiLicense: llama3.2

llama-3.2-sun-2.5b-chat
Base Model Llama 3.2 1B Extended Size 1B to 2.5B parameters Extension Method Proprietary technique developed by MedIT Solutions Fine-tuning Open (or open subsets allowing for commercial use) open datasets from HF Open (or open subsets allowing for commercial use) SFT datasets from HF Training Status Current version: chat-1.0.0 Key Features Built on Llama 3.2 architecture Expanded from 1B to 2.47B parameters Optimized for open-ended conversations Incorporates supervised fine-tuning for improved performance Use Case General conversation and task-oriented interactions

Repository: localaiLicense: llama3.2

llama3.2-3b-enigma
Enigma is a code-instruct model built on Llama 3.2 3b. It is a high quality code instruct model with the Llama 3.2 Instruct chat format. The model is finetuned on synthetic code-instruct data generated with Llama 3.1 405b and supplemented with generalist synthetic data. It uses the Llama 3.2 Instruct prompt format.

Repository: localaiLicense: llama3.2

eximius_persona_5b
I wanted to create a model with an exceptional capacity for using varied speech patterns and fresh role-play takes. The model had to have a unique personality, not on a surface level but on the inside, for real. Unfortunately, SFT alone just didn't cut it. And I had only 16GB of VRAM at the time. Oh, and I wanted it to be small enough to be viable for phones and to be able to give a fight to larger models while at it. If only there was a magical way to do it. Merges. Merges are quite unique. In the early days, they were considered "fake." Clearly, there's no such thing as merges. Where are the papers? No papers? Then it's clearly impossible. "Mathematically impossible." Simply preposterous. To mix layers and hope for a coherent output? What nonsense! And yet, they were real. Undi95 made some of the earliest merges I can remember, and the "LLAMA2 Era" was truly amazing and innovative thanks to them. Cool stuff like Tiefighter was being made, and eventually the time tested Midnight-Miqu-70B (v1.5 is my personal favorite). Merges are an interesting thing, as they affect LLMs in a way that is currently impossible to reproduce using SFT (or any 'SOTA' technique). One of the plagues we have today, while we have orders of magnitude smarter LLMs, is GPTisms and predictability. Merges can potentially 'solve' that. How? In short, if you physically tear neurons (passthrough brain surgery) while you somehow manage to keep the model coherent enough, and if you're lucky, it can even follows instructions- then magical stuff begins to happen.

Repository: localaiLicense: llama3.2

llama3.1-8b-enigma
Enigma is a code-instruct model built on Llama 3.1 8b. High quality code instruct performance within the Llama 3 Instruct chat format Finetuned on synthetic code-instruct data generated with Llama 3.1 405b. Find the current version of the dataset here! Overall chat performance supplemented with generalist synthetic data. This is the 2024-10-02 release of Enigma for Llama 3.1 8b, enhancing code-instruct and general chat capabilities.

Repository: localaiLicense: llama3.1

Page 1