LocalAI - Models

gemma-3-27b-it

Google/gemma-3-27b-it is an open-source, state-of-the-art vision-language model built from the same research and technology used to create the Gemini models. It is multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 models have a large, 128K context window, multilingual support in over 140 languages, and are available in more sizes than previous versions. They are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

Links

Tags

gemma-3-12b-it

google/gemma-3-12b-it is an open-source, state-of-the-art, lightweight, multimodal model built from the same research and technology used to create the Gemini models. It is capable of handling text and image input and generating text output. It has a large context window of 128K tokens and supports over 140 languages. The 12B variant has been fine-tuned using the instruction-tuning approach. Gemma 3 models are suitable for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes them deployable in environments with limited resources such as laptops, desktops, or your own cloud infrastructure.

Links

Tags

gemma-3-4b-it

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. Gemma-3-4b-it is a 4 billion parameter model.

Links

Tags

gemma-3-1b-it

google/gemma-3-1b-it is a large language model with 1 billion parameters. It is part of the Gemma family of open, state-of-the-art models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. These models have multilingual support in over 140 languages, and are available in more sizes than previous versions. They are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

Links

Tags

gemma-3-12b-it-qat

This model corresponds to the 12B instruction-tuned version of the Gemma 3 model in GGUF format using Quantization Aware Training (QAT). The GGUF corresponds to Q4_0 quantization. Thanks to QAT, the model is able to preserve similar quality as bfloat16 while significantly reducing the memory requirements to load the model. You can find the half-precision version here.

Links

Tags

gemma-3-4b-it-qat

This model corresponds to the 4B instruction-tuned version of the Gemma 3 model in GGUF format using Quantization Aware Training (QAT). The GGUF corresponds to Q4_0 quantization. Thanks to QAT, the model is able to preserve similar quality as bfloat16 while significantly reducing the memory requirements to load the model. You can find the half-precision version here.

Links

Tags

gemma-3-27b-it-qat

This model corresponds to the 27B instruction-tuned version of the Gemma 3 model in GGUF format using Quantization Aware Training (QAT). The GGUF corresponds to Q4_0 quantization. Thanks to QAT, the model is able to preserve similar quality as bfloat16 while significantly reducing the memory requirements to load the model. You can find the half-precision version here.

Links

Tags

qgallouedec_gemma-3-27b-it-codeforces-sft

This model is a fine-tuned version of google/gemma-3-27b-it on the open-r1/codeforces-cots dataset. It has been trained using TRL.

Links

Tags

mlabonne_gemma-3-27b-it-abliterated

This is an uncensored version of google/gemma-3-27b-it created with a new abliteration technique. See this article to know more about abliteration.

Links

Tags

mlabonne_gemma-3-12b-it-abliterated

This is an uncensored version of google/gemma-3-12b-it created with a new abliteration technique. See this article to know more about abliteration.

Links

Tags

mlabonne_gemma-3-4b-it-abliterated

This is an uncensored version of google/gemma-3-4b-it created with a new abliteration technique. See this article to know more about abliteration.

Links

Tags

gemma-3-4b-it-uncensored-dbl-x-i1

Google's newest Gemma-3 model that has been uncensored by David_AU (maintains instruction following / model performance and adds 4 layers to the model) and re-enforced with a system prompt (optional) - see below.

Links

Tags

huihui-ai_gemma-3-1b-it-abliterated

This is an uncensored version of google/gemma-3-1b-it created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens

Links

Tags

gemma-3-glitter-12b-i1

A creative writing model based on Gemma 3 12B IT. This is a 50/50 merge of two separate trains: ToastyPigeon/g3-12b-rp-system-v0.1 - ~13.5M tokens of instruct-based training related to RP (2:1 human to synthetic) and examples using a system prompt. ToastyPigeon/g3-12b-storyteller-v0.2-textonly - ~20M tokens of completion training on long-form creative writing; 1.6M synthetic from R1, the rest human-created

Links

Tags

gemma-3-starshine-12b-i1

A creative writing model based on a merge of fine-tunes on Gemma 3 12B IT and Gemma 3 12B PT. This is the Story Focused merge. This version works better for storytelling and scenarios, as the prose is more novel-like and it has a tendency to impersonate the user character. See the Alternate RP Focused version as well. This is a merge of two G3 models, one trained on instruct and one trained on base: allura-org/Gemma-3-Glitter-12B - Itself a merge of a storywriting and RP train (both also by ToastyPigeon), on instruct ToastyPigeon/Gemma-3-Confetti-12B - Experimental application of the Glitter data using base instead of instruct, additionally includes some adventure data in the form of SpringDragon. The result is a lovely blend of Glitter's ability to follow instructions and Confetti's free-spirit prose, effectively 'loosening up' much of the hesitancy that was left in Glitter.

Links

Tags

burtenshaw_gemmacoder3-12b

This model is a fine-tuned version of google/gemma-3-12b-it on the open-r1/codeforces-cots dataset. It has been trained using TRL.

Links

Tags

google-gemma-3-27b-it-qat-q4_0-small

This is a requantized version of https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-gguf. The official QAT weights released by google use fp16 (instead of Q6_K) for the embeddings table, which makes this model take a significant extra amount of memory (and storage) compared to what Q4_0 quants are supposed to take. Requantizing with llama.cpp achieves a very similar result. Note that this model ends up smaller than the Q4_0 from Bartowski. This is because llama.cpp sets some tensors to Q4_1 when quantizing models to Q4_0 with imatrix, but this is a static quant. The perplexity score for this one is even lower with this model compared to the original model by Google, but the results are within margin of error, so it's probably just luck. I also fixed the control token metadata, which was slightly degrading the performance of the model in instruct mode.

Links

Tags

genericrpv3-4b

Model's part of the GRP / GenericRP series, that's V3 based on Gemma3 4B, licensed accordingly. It's a simple merge. To see intended behavious, see V2 or sum, card's more detailed. allura-org/Gemma-3-Glitter-4B: w0.5 huihui-ai/gemma-3-4b-it-abliterated: w0.25 Danielbrdz/Barcenas-4b: w0.25 Happy chatting or whatever.

Links

Tags

gemma-3-12b-fornaxv.2-qat-cot

This model is an experiment to try to produce a strong smaller thinking model capable of fitting in an 8GiB consumer graphics card with generalizeable reasoning capabilities. Most other open source thinking models, especially on the smaller side, fail to generalize their reasoning to tasks other than coding or math due to an overly large focus on GRPO zero for CoT which is only applicable for coding and math. Instead of using GRPO, this model aims to SFT a wide variety of high quality, diverse reasoning traces from Deepseek R1 onto Gemma 3 to force the model to learn to effectively generalize its reasoning capabilites to a large number of tasks as an extension of the LiMO paper's approach to Math/Coding CoT. A subset of V3 O3/24 non-thinking data was also included for improved creativity and to allow the model to retain it's non-thinking capabilites. Training off the QAT checkpoint allows for this model to be used without a drop in quality at Q4_0, requiring only ~6GiB of memory. Thinking Mode Similar to the Qwen 3 model line, Gemma Fornax can be used with or without thinking mode enabled. To enable thinking place /think in the system prompt and prefill \n for thinking mode. To disable thinking put /no_think in the system prompt.

Links

Tags

gemma-3n-e2b-it

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for pre-trained and instruction-tuned variants. These models were trained with data in over 140 spoken languages. Gemma 3n models use selective parameter activation technology to reduce resource requirements. This technique allows the models to operate at an effective size of 2B and 4B parameters, which is lower than the total number of parameters they contain. For more information on Gemma 3n's efficient parameter management technology, see the Gemma 3n page.

Links

Tags

gemma-3n-e4b-it

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for pre-trained and instruction-tuned variants. These models were trained with data in over 140 spoken languages. Gemma 3n models use selective parameter activation technology to reduce resource requirements. This technique allows the models to operate at an effective size of 2B and 4B parameters, which is lower than the total number of parameters they contain. For more information on Gemma 3n's efficient parameter management technology, see the Gemma 3n page.

Links

Tags

Model Gallery

Filter by type:

Filter by tags:

gemma-3-27b-it

gemma-3-12b-it

gemma-3-4b-it

gemma-3-1b-it

gemma-3-12b-it-qat

gemma-3-4b-it-qat

gemma-3-27b-it-qat

qgallouedec_gemma-3-27b-it-codeforces-sft

mlabonne_gemma-3-27b-it-abliterated

mlabonne_gemma-3-12b-it-abliterated

mlabonne_gemma-3-4b-it-abliterated

gemma-3-4b-it-uncensored-dbl-x-i1

huihui-ai_gemma-3-1b-it-abliterated

gemma-3-glitter-12b-i1

gemma-3-starshine-12b-i1

burtenshaw_gemmacoder3-12b

google-gemma-3-27b-it-qat-q4_0-small

genericrpv3-4b

gemma-3-12b-fornaxv.2-qat-cot

gemma-3n-e2b-it

gemma-3n-e4b-it