LocalAI - Models

ibm-granite_granite-4.0-h-small

Granite-4.0-H-Small is a 32B parameter long-context instruct model finetuned from Granite-4.0-H-Small-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Links

Tags

ibm-granite_granite-4.0-h-tiny

Granite-4.0-H-Tiny is a 7B parameter long-context instruct model finetuned from Granite-4.0-H-Tiny-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Links

Tags

ibm-granite_granite-4.0-h-micro

Granite-4.0-H-Micro is a 3B parameter long-context instruct model finetuned from Granite-4.0-H-Micro-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Links

Tags

ibm-granite_granite-4.0-micro

Granite-4.0-Micro is a 3B parameter long-context instruct model finetuned from Granite-4.0-Micro-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Links

Tags

allura-org_remnant-qwen3-8b

There's a wisp of dust in the air. It feels like its from a bygone era, but you don't know where from. It lands on your tongue. It tastes nice. Remnant is a series of finetuned LLMs focused on SFW and NSFW roleplaying and conversation.

Links

Tags

allura-org_q3-30b-a3b-designant

Intended as a direct upgrade to Pentiment, Q3-30B-A3B-Designant is a roleplaying model finetuned from Qwen3-30B-A3B-Base. During testing, Designant punched well above its weight class in terms of active parameters, demonstrating the potential for well-made lightweight Mixture of Experts models in the roleplay scene. While one tester observed looping behavior, repetition in general was minimal.

Links

Tags

nbeerbower_qwen3-gutenberg-encore-14b

nbeerbower/Xiaolong-Qwen3-14B finetuned on: jondurbin/gutenberg-dpo-v0.1 nbeerbower/gutenberg2-dpo nbeerbower/gutenberg-moderne-dpo nbeerbower/synthetic-fiction-dpo nbeerbower/Arkhaios-DPO nbeerbower/Purpura-DPO nbeerbower/Schule-DPO

Links

Tags

allura-org_q3-8b-kintsugi

Q3-8B-Kintsugi is a roleplaying model finetuned from Qwen3-8B-Base. During testing, Kintsugi punched well above its weight class in terms of parameters, especially for 1-on-1 roleplaying and general storywriting.

Links

Tags

gustavecortal_beck-8b

A language model that handles delicate life situations and tries to really help you. Beck is based on Piaget and was finetuned on psychotherapeutic preferences from PsychoCounsel-Preference. Methodology Beck was trained using preference optimization (ORPO) and LoRA. You can reproduce the results using my repo for lightweight preference optimization using this config that contains the hyperparameters. This work was performed using HPC resources (Jean Zay supercomputer) from GENCI-IDRIS (Grant 20XX-AD011014205). Inspiration Beck aims to reason about psychological and philosophical concepts such as self-image, emotion, and existence. Beck was inspired by my position paper on emotion analysis: Improving Language Models for Emotion Analysis: Insights from Cognitive Science.

Links

Tags

gustavecortal_beck-0.6b

A language model that handles delicate life situations and tries to really help you. Beck is based on Piaget and was finetuned on psychotherapeutic preferences from PsychoCounsel-Preference. Methodology Beck was trained using preference optimization (ORPO) and LoRA. You can reproduce the results using my repo for lightweight preference optimization using this config that contains the hyperparameters. This work was performed using HPC resources (Jean Zay supercomputer) from GENCI-IDRIS (Grant 20XX-AD011014205). Inspiration Beck aims to reason about psychological and philosophical concepts such as self-image, emotion, and existence. Beck was inspired by my position paper on emotion analysis: Improving Language Models for Emotion Analysis: Insights from Cognitive Science.

Links

Tags

gustavecortal_beck-1.7b

A language model that handles delicate life situations and tries to really help you. Beck is based on Piaget and was finetuned on psychotherapeutic preferences from PsychoCounsel-Preference. Methodology Beck was trained using preference optimization (ORPO) and LoRA. You can reproduce the results using my repo for lightweight preference optimization using this config that contains the hyperparameters. This work was performed using HPC resources (Jean Zay supercomputer) from GENCI-IDRIS (Grant 20XX-AD011014205). Inspiration Beck aims to reason about psychological and philosophical concepts such as self-image, emotion, and existence. Beck was inspired by my position paper on emotion analysis: Improving Language Models for Emotion Analysis: Insights from Cognitive Science.

Links

Tags

gustavecortal_beck-4b

A language model that handles delicate life situations and tries to really help you. Beck is based on Piaget and was finetuned on psychotherapeutic preferences from PsychoCounsel-Preference. Methodology Beck was trained using preference optimization (ORPO) and LoRA. You can reproduce the results using my repo for lightweight preference optimization using this config that contains the hyperparameters. This work was performed using HPC resources (Jean Zay supercomputer) from GENCI-IDRIS (Grant 20XX-AD011014205). Inspiration Beck aims to reason about psychological and philosophical concepts such as self-image, emotion, and existence. Beck was inspired by my position paper on emotion analysis: Improving Language Models for Emotion Analysis: Insights from Cognitive Science.

Links

Tags

qwen3-4b-ra-sft

a 4B-sized agentic reasoning model that is finetuned with our 3k Agentic SFT dataset, based on Qwen3-4B-Instruct-2507. In our work, we systematically investigate three dimensions of agentic RL: data, algorithms, and reasoning modes. Our findings reveal 🎯 Data Quality Matters: Real end-to-end trajectories and high-diversity datasets significantly outperform synthetic alternatives ⚡ Training Efficiency: Exploration-friendly techniques like reward clipping and entropy maintenance boost training efficiency 🧠 Reasoning Strategy: Deliberative reasoning with selective tool calls surpasses frequent invocation or verbose self-reasoning We contribute high-quality SFT and RL datasets, demonstrating that simple recipes enable even 4B models to outperform 32B models on the most challenging reasoning benchmarks.

Links

Tags

l3.3-70b-magnum-v4-se

The Magnum v4 series is complete, but here's something a little extra I wanted to tack on as I wasn't entirely satisfied with the results of v4 72B. "SE" for Special Edition - this model is finetuned from meta-llama/Llama-3.3-70B-Instruct as an rsLoRA adapter. The dataset is a slightly revised variant of the v4 data with some elements of the v2 data re-introduced. The objective, as with the other Magnum models, is to emulate the prose style and quality of the Claude 3 Sonnet/Opus series of models on a local scale, so don't be surprised to see "Claude-isms" in its output.

Links

Tags

e-n-v-y_legion-v2.1-llama-70b-elarablated-v0.8-hf

This checkpoint was finetuned with a process I'm calling "Elarablation" (a portamenteau of "Elara", which is a name that shows up in AI-generated writing and RP all the time) and "ablation". The idea is to reduce the amount of repetitiveness and "slop" that the model exhibits. In addition to significantly reducing the occurrence of the name "Elara", I've also reduced other very common names that pop up in certain situations. I've also specifically attacked two phrases, "voice barely above a whisper" and "eyes glinted with mischief", which come up a lot less often now. Finally, I've convinced it that it can put a f-cking period after the word "said" because a lot of slop-ish phrases tend to come after "said,". You can check out some of the more technical details in the overview on my github repo, here: https://github.com/envy-ai/elarablate My current focus has been on some of the absolute worst offending phrases in AI creative writing, but I plan to go after RP slop as well. If you run into any issues with this model (going off the rails, repeating tokens, etc), go to the community tab and post the context and parameters in a comment so I can look into it. Also, if you have any "slop" pet peeves, post the context of those as well and I can try to reduce/eliminate them in the next version. The settings I've tested with are temperature at 0.7 and all other filters completely neutral. Other settings may lead to better or worse results.

Links

Tags

opencoder-8b-base

The model is a quantized version of infly/OpenCoder-8B-Base created using llama.cpp. It is part of the OpenCoder LLM family which includes 1.5B and 8B base and chat models, supporting both English and Chinese languages. The original OpenCoder model was pretrained on 2.5 trillion tokens composed of 90% raw code and 10% code-related web data, and supervised finetuned on over 4.5M high-quality SFT examples. It achieves high performance across multiple language model benchmarks and is one of the most comprehensively open-sourced models available.

Links

Tags

opencoder-8b-instruct

The LLM model is QuantFactory/OpenCoder-8B-Instruct-GGUF, which is a quantized version of infly/OpenCoder-8B-Instruct. It is created using llama.cpp and supports both English and Chinese languages. The original model, infly/OpenCoder-8B-Instruct, is pretrained on 2.5 trillion tokens composed of 90% raw code and 10% code-related web data, and supervised finetuned on over 4.5M high-quality SFT examples. It achieves high performance across multiple language model benchmarks and is one of the leading open-source models for code.

Links

Tags

opencoder-1.5b-instruct

The model is a quantized version of [infly/OpenCoder-1.5B-Instruct](https://huggingface.co/infly/OpenCoder-1.5B-Instruct) created using llama.cpp. The original model, infly/OpenCoder-1.5B-Instruct, is an open and reproducible code LLM family which includes 1.5B and 8B base and chat models, supporting both English and Chinese languages. The model is pretrained on 2.5 trillion tokens composed of 90% raw code and 10% code-related web data, and supervised finetuned on over 4.5M high-quality SFT examples. It achieves high performance across multiple language model benchmarks, positioning it among the leading open-source models for code.

Links

https://huggingface.co/QuantFactory/OpenCoder-1.5B-Instruct-GGUF

Tags

granite-3.0-1b-a400m-instruct

Granite 3.0 language models are a new set of lightweight state-of-the-art, open foundation models that natively support multilinguality, coding, reasoning, and tool usage, including the potential to be run on constrained compute resources. All the models are publicly released under an Apache 2.0 license for both research and commercial use. The models' data curation and training procedure were designed for enterprise usage and customization in mind, with a process that evaluates datasets for governance, risk and compliance (GRC) criteria, in addition to IBM's standard data clearance process and document quality checks. Granite 3.0 includes 4 different models of varying sizes: Dense Models: 2B and 8B parameter models, trained on 12 trillion tokens in total. Mixture-of-Expert (MoE) Models: Sparse 1B and 3B MoE models, with 400M and 800M activated parameters respectively, trained on 10 trillion tokens in total. Accordingly, these options provide a range of models with different compute requirements to choose from, with appropriate trade-offs with their performance on downstream tasks. At each scale, we release a base model — checkpoints of models after pretraining, as well as instruct checkpoints — models finetuned for dialogue, instruction-following, helpfulness, and safety.

Links

Tags

llama3.2-3b-enigma

Enigma is a code-instruct model built on Llama 3.2 3b. It is a high quality code instruct model with the Llama 3.2 Instruct chat format. The model is finetuned on synthetic code-instruct data generated with Llama 3.1 405b and supplemented with generalist synthetic data. It uses the Llama 3.2 Instruct prompt format.

Links

https://huggingface.co/QuantFactory/Llama3.2-3B-Enigma-GGUF

Tags

llama3.2-3b-enigma

Enigma is a code-instruct model built on Llama 3.2 3b. It is a high quality code instruct model with the Llama 3.2 Instruct chat format. The model is finetuned on synthetic code-instruct data generated with Llama 3.1 405b and supplemented with generalist synthetic data. It uses the Llama 3.2 Instruct prompt format.

Links

https://huggingface.co/QuantFactory/Llama3.2-3B-Enigma-GGUF

Tags

Model Gallery

Filter by type:

Filter by tags:

ibm-granite_granite-4.0-h-small

ibm-granite_granite-4.0-h-tiny

ibm-granite_granite-4.0-h-micro

ibm-granite_granite-4.0-micro

allura-org_remnant-qwen3-8b

allura-org_q3-30b-a3b-designant

nbeerbower_qwen3-gutenberg-encore-14b

allura-org_q3-8b-kintsugi

gustavecortal_beck-8b

gustavecortal_beck-0.6b

gustavecortal_beck-1.7b

gustavecortal_beck-4b

qwen3-4b-ra-sft

l3.3-70b-magnum-v4-se

e-n-v-y_legion-v2.1-llama-70b-elarablated-v0.8-hf

opencoder-8b-base

opencoder-8b-instruct

opencoder-1.5b-instruct

granite-3.0-1b-a400m-instruct

llama3.2-3b-enigma

llama3.2-3b-enigma