LocalAI - Models

minicpm5-1b-claude-opus-fable5-thinking

# MiniCPM5-1B-Claude-Opus-Fable5-Thinking GGUF quantizations for local deployment: **MiniCPM5-1B-Claude-Opus-Fable5-Thinking-GGUF** 中文说明 **MiniCPM5-1B-Claude-Opus-Fable5-Thinking** is a compact 1B **Thinking** language model built on openbmb/MiniCPM5-1B. It is further fine-tuned on **Fable 5** data to improve **coding** and **instruction-following** while keeping MiniCPM5's native Thinking chat template and tool-call format. For llama.cpp / Ollama / LM Studio deployment, see the **GGUF repository**. ## Overview ## Capabilities - **Coding** — code generation, debugging, and software-engineering-style tasks - **Instruction following** — more reliable adherence to user prompts and structured constraints - **Thinking mode** — chain-of-thought reasoning via the MiniCPM5 chat template - **Tool calling** — inherits MiniCPM5's XML tool-call format - **Long context** — up to **128K tokens** (131,072 tokens per `config.json`) ## Quick start ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "GnLOLot/MiniCPM5-1B-Claude-Opus-Fable5-Thinking" ...

Links

https://huggingface.co/GnLOLot/MiniCPM5-1B-Claude-Opus-Fable5-Thinking-GGUF

Tags

serenity-26b-a4b

.mc-wrap{background:#0d1117;color:#c9d1d9;font-family:'Inter',sans-serif;max-width:920px;margin:0 auto;padding:24px;border-radius:16px;box-sizing:border-box} .mc-wrap *{box-sizing:border-box} .mc-wrap h1,.mc-wrap h2,.mc-wrap h3,.mc-wrap h4{color:#e6edf3;border:none} .mc-wrap p{color:#c9d1d9} .mc-wrap strong{color:#7ee8d0} .mc-wrap a{color:#7ee8d0;text-decoration:none} .mc-wrap ul{list-style:none;padding-left:0;margin:0} .mc-wrap li{color:#c9d1d9;margin-bottom:8px;padding-left:4px} .mc-wrap code{background:#161b22;color:#7ee8d0;padding:2px 8px;border-radius:4px;font-family:'JetBrains Mono',monospace;font-size:.88em;border:1px solid rgba(126,232,208,.15)} .mc-hdr{text-align:center;padding:40px 32px;background:#0d1117;border:1px solid #21262d;border-radius:24px;margin-bottom:20px;position:relative;overflow:hidden} .mc-hdr::before{content:'';position:absolute;top:0;left:0;right:0;height:3px;background:linear-gradient(135deg,#7ee8d0,#a78bfa,#c4b5fd)} .mc-name{font-family:'Space Grotesk',sans-serif;font-size:2.8em;font-weight:800;margin:0;letter-spacing:-.02em;background:linear-gradient(135deg,#7ee8d0,#a78bfa,#c4b5fd);-webkit-background-clip:text;-webkit-text-fill-color:transparent;backg ...

Links

https://huggingface.co/ReadyArt/Serenity-26B-A4B-GGUF

Tags

gemma-4-31b-it-qat-q4_0

Gemma 4 31B is the largest dense multimodal (text + image) instruction-tuned model in the Gemma 4 family from Google DeepMind, optimized with Quantization-Aware Training (QAT) to preserve bfloat16-level quality while dramatically reducing the memory required to load the model. This is the official Google Q4_0 GGUF, shipped with its multimodal projector. License: Apache 2.0 | Authors: Google DeepMind

Links

https://huggingface.co/google/gemma-4-31B-it-qat-q4_0-gguf

Tags

lfm2.5-8b-a1b

Try LFM • Docs • LEAP • Discord # LFM2.5-8B-A1B LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning. - **On-device personal assistant**: Designed to power real-life applications, chaining tool calls, and following complex instructions on all devices. - **Compressed performance**: Competitive with much larger dense and MoE models on instruction following and agentic tasks. - **Unmatched throughput**: Fastest in its size class on both CPU and GPU inference, with day-one support for llama.cpp, MLX, vLLM, and SGLang. Find more information about LFM2.5-8B-A1B in our blog post. **AA-Omniscience Index (higher is better) rewards correct answers and penalizes hallucinations. Scores range from -100 to 100. See more results on Artificial Analysis.* ## 🗒️ Model Details LFM2.5-8B-A1B is a general-purpose text-only model with the following features: ...

Links

https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF

Tags

gemma-4-31b-it

Google Gemma 4 31B-IT is the largest dense model in the Gemma 4 family with 31B parameters. It handles text and image input, generating text output, with a 256K context window and support for 140+ languages. Provides the highest quality outputs in the Gemma 4 lineup, well-suited for complex reasoning, summarization, and image understanding tasks.

Links

Tags

baidu_ernie-4.5-21b-a3b-thinking

Over the past three months, we have continued to scale the thinking capability of ERNIE-4.5-21B-A3B, improving both the quality and depth of reasoning, thereby advancing the competitiveness of ERNIE lightweight models in complex reasoning tasks. We are pleased to introduce ERNIE-4.5-21B-A3B-Thinking, featuring the following key enhancements: Significantly improved performance on reasoning tasks, including logical reasoning, mathematics, science, coding, text generation, and academic benchmarks that typically require human expertise. Efficient tool usage capabilities. Enhanced 128K long-context understanding capabilities. Note: This version has an increased thinking length. We strongly recommend its use in highly complex reasoning tasks. ERNIE-4.5-21B-A3B-Thinking is a text MoE post-trained model, with 21B total parameters and 3B activated parameters for each token.

Links

Tags

liquidai_lfm2-8b-a1b

LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. We're releasing the weights of our first MoE based on LFM2, with 8.3B total parameters and 1.5B active parameters. LFM2-8B-A1B is the best on-device MoE in terms of both quality (comparable to 3-4B dense models) and speed (faster than Qwen3-1.7B). Code and knowledge capabilities are significantly improved compared to LFM2-2.6B. Quantized variants fit comfortably on high-end phones, tablets, and laptops.

Links

Tags

qwen3-8b-jailbroken

This jailbroken LLM is released strictly for academic research purposes in AI safety and model alignment studies. The author bears no responsibility for any misuse or harm resulting from the deployment of this model. Users must comply with all applicable laws and ethical guidelines when conducting research. A jailbroken Qwen3-8B model using weight orthogonalization[1]. Implementation script: https://gist.github.com/cooperleong00/14d9304ba0a4b8dba91b60a873752d25 [1]: Arditi, Andy, et al. "Refusal in language models is mediated by a single direction." arXiv preprint arXiv:2406.11717 (2024).

Links

Tags

minicpm5-1b

MiniCPM5-1B is a compact 1B-parameter language model based on the LLaMA architecture. Despite its small size, it achieves competitive performance among sub-2B models, making it ideal for edge deployment and resource-constrained environments.

Links

Tags

gemma-3-1b-it

google/gemma-3-1b-it is a large language model with 1 billion parameters. It is part of the Gemma family of open, state-of-the-art models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. These models have multilingual support in over 140 languages, and are available in more sizes than previous versions. They are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

Links

Tags

huihui-ai_gemma-3-1b-it-abliterated

This is an uncensored version of google/gemma-3-1b-it created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens

Links

Tags

amoral-gemma3-1b-v2

Core Function: Produces analytically neutral responses to sensitive queries Maintains factual integrity on controversial subjects Avoids value-judgment phrasing patterns Response Characteristics: No inherent moral framing ("evil slop" reduction) Emotionally neutral tone enforcement Epistemic humility protocols (avoids "thrilling", "wonderful", etc.)

Links

Tags

falcon3-1b-instruct

Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. This repository contains the Falcon3-1B-Instruct. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-1B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 8K.

Links

Tags

falcon3-1b-instruct-abliterated

This is an uncensored version of tiiuae/Falcon3-1B-Instruct created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.

Links

Tags

granite-3.0-1b-a400m-instruct

Granite 3.0 language models are a new set of lightweight state-of-the-art, open foundation models that natively support multilinguality, coding, reasoning, and tool usage, including the potential to be run on constrained compute resources. All the models are publicly released under an Apache 2.0 license for both research and commercial use. The models' data curation and training procedure were designed for enterprise usage and customization in mind, with a process that evaluates datasets for governance, risk and compliance (GRC) criteria, in addition to IBM's standard data clearance process and document quality checks. Granite 3.0 includes 4 different models of varying sizes: Dense Models: 2B and 8B parameter models, trained on 12 trillion tokens in total. Mixture-of-Expert (MoE) Models: Sparse 1B and 3B MoE models, with 400M and 800M activated parameters respectively, trained on 10 trillion tokens in total. Accordingly, these options provide a range of models with different compute requirements to choose from, with appropriate trade-offs with their performance on downstream tasks. At each scale, we release a base model — checkpoints of models after pretraining, as well as instruct checkpoints — models finetuned for dialogue, instruction-following, helpfulness, and safety.

Links

Tags

moe-girl-1ba-7bt-i1

A finetune of OLMoE by AllenAI designed for roleplaying (and maybe general usecases if you try hard enough). PLEASE do not expect godliness out of this, it's a model with 1 billion active parameters. Expect something more akin to Gemma 2 2B, not Llama 3 8B.

Links

Tags

llama-3.2-1b-instruct:q4_k_m

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Links

https://huggingface.co/hugging-quants/Llama-3.2-1B-Instruct-Q4_K_M-GGUF

Tags

llama-3.2-1b-instruct:q8_0

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Links

https://huggingface.co/hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF

Tags

llama-smoltalk-3.2-1b-instruct

The Llama-SmolTalk-3.2-1B-Instruct model is a lightweight, instruction-tuned model designed for efficient text generation and conversational AI tasks. With a 1B parameter architecture, this model strikes a balance between performance and resource efficiency, making it ideal for applications requiring concise, contextually relevant outputs. The model has been fine-tuned to deliver robust instruction-following capabilities, catering to both structured and open-ended queries. Key Features: Instruction-Tuned Performance: Optimized to understand and execute user-provided instructions across diverse domains. Lightweight Architecture: With just 1 billion parameters, the model provides efficient computation and storage without compromising output quality. Versatile Use Cases: Suitable for tasks like content generation, conversational interfaces, and basic problem-solving. Intended Applications: Conversational AI: Engage users with dynamic and contextually aware dialogue. Content Generation: Produce summaries, explanations, or other creative text outputs efficiently. Instruction Execution: Follow user commands to generate precise and relevant responses.

Links

Tags

fastllama-3.2-1b-instruct

FastLlama is a highly optimized version of the Llama-3.2-1B-Instruct model. Designed for superior performance in constrained environments, it combines speed, compactness, and high accuracy. This version has been fine-tuned using the MetaMathQA-50k section of the HuggingFaceTB/smoltalk dataset to enhance its mathematical reasoning and problem-solving abilities.

Links

Tags

dolphin3.0-llama3.2-1b

Dolphin 3.0 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model, enabling coding, math, agentic, function calling, and general use cases. Dolphin aims to be a general purpose model, similar to the models behind ChatGPT, Claude, Gemini. But these models present problems for businesses seeking to include AI in their products. They maintain control of the system prompt, deprecating and changing things as they wish, often causing software to break. They maintain control of the model versions, sometimes changing things silently, or deprecating older models that your business relies on. They maintain control of the alignment, and in particular the alignment is one-size-fits all, not tailored to the application. They can see all your queries and they can potentially use that data in ways you wouldn't want. Dolphin, in contrast, is steerable and gives control to the system owner. You set the system prompt. You decide the alignment. You have control of your data. Dolphin does not impose its ethics or guidelines on you. You are the one who decides the guidelines. Dolphin belongs to YOU, it is your tool, an extension of your will. Just as you are personally responsible for what you do with a knife, gun, fire, car, or the internet, you are the creator and originator of any content you generate with Dolphin.

Links

Tags

Model Gallery

Filter by type:

Filter by tags:

minicpm5-1b-claude-opus-fable5-thinking

serenity-26b-a4b

gemma-4-31b-it-qat-q4_0

lfm2.5-8b-a1b

gemma-4-31b-it

baidu_ernie-4.5-21b-a3b-thinking

liquidai_lfm2-8b-a1b

qwen3-8b-jailbroken

minicpm5-1b

gemma-3-1b-it

huihui-ai_gemma-3-1b-it-abliterated

amoral-gemma3-1b-v2

falcon3-1b-instruct

falcon3-1b-instruct-abliterated

granite-3.0-1b-a400m-instruct

moe-girl-1ba-7bt-i1

llama-3.2-1b-instruct:q4_k_m

llama-3.2-1b-instruct:q8_0

llama-smoltalk-3.2-1b-instruct

fastllama-3.2-1b-instruct

dolphin3.0-llama3.2-1b