Model Gallery

13 models from 1 repositories

Filter by type:

Filter by tags:

qwen3-4b-esper3-i1
Esper 3 is a coding, architecture, and DevOps reasoning specialist built on Qwen 3. Finetuned on our DevOps and architecture reasoning and code reasoning data generated with Deepseek R1! Improved general and creative reasoning to supplement problem-solving and general chat performance. Small model sizes allow running on local desktop and mobile, plus super-fast server inference!

Repository: localaiLicense: apache-2.0

qwen3-8b-shiningvaliant3
Shining Valiant 3 is a science, AI design, and general reasoning specialist built on Qwen 3. Finetuned on our newest science reasoning data generated with Deepseek R1 0528! AI to build AI: our high-difficulty AI reasoning data makes Shining Valiant 3 your friend for building with current AI tech and discovering new innovations and improvements! Improved general and creative reasoning to supplement problem-solving and general chat performance. Small model sizes allow running on local desktop and mobile, plus super-fast server inference!

Repository: localaiLicense: apache-2.0

gemma-3-27b-it
Google/gemma-3-27b-it is an open-source, state-of-the-art vision-language model built from the same research and technology used to create the Gemini models. It is multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 models have a large, 128K context window, multilingual support in over 140 languages, and are available in more sizes than previous versions. They are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

Repository: localaiLicense: gemma

gemma-3-12b-it
google/gemma-3-12b-it is an open-source, state-of-the-art, lightweight, multimodal model built from the same research and technology used to create the Gemini models. It is capable of handling text and image input and generating text output. It has a large context window of 128K tokens and supports over 140 languages. The 12B variant has been fine-tuned using the instruction-tuning approach. Gemma 3 models are suitable for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes them deployable in environments with limited resources such as laptops, desktops, or your own cloud infrastructure.

Repository: localaiLicense: gemma

gemma-3-4b-it
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. Gemma-3-4b-it is a 4 billion parameter model.

Repository: localaiLicense: gemma

gemma-3-1b-it
google/gemma-3-1b-it is a large language model with 1 billion parameters. It is part of the Gemma family of open, state-of-the-art models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. These models have multilingual support in over 140 languages, and are available in more sizes than previous versions. They are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

Repository: localaiLicense: gemma

gemma-3-270m-it-qat
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. This model is a QAT (Quantization Aware Training) version of the Gemma 3 270M model. It is quantized to 4-bit precision, which means that it uses 4-bit floating point numbers to represent the weights and activations of the model. This reduces the memory footprint of the model and makes it faster to run on GPUs.

Repository: localaiLicense: gemma

archangel_sft_pythia2-8b
datasets: - stanfordnlp/SHP - Anthropic/hh-rlhf - OpenAssistant/oasst1 This repo contains the model checkpoints for: - model family pythia2-8b - optimized with the loss SFT - aligned using the SHP, Anthropic HH and Open Assistant datasets. Please refer to our [code repository](https://github.com/ContextualAI/HALOs) or [blog](https://contextual.ai/better-cheaper-faster-llm-alignment-with-kto/) which contains intructions for training your own HALOs and links to our model cards.

Repository: localaiLicense: apache-2.0

mn-lulanum-12b-fix-i1
This model was merged using the della_linear merge method using unsloth/Mistral-Nemo-Base-2407 as a base. The following models were included in the merge: VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct anthracite-org/magnum-v2.5-12b-kto Undi95/LocalC-12B-e2.0 NeverSleep/Lumimaid-v0.2-12B

Repository: localaiLicense: apache-2.0

magnum-12b-v2.5-kto-i1
v2.5 KTO is an experimental release; we are testing a hybrid reinforcement learning strategy of KTO + DPOP, using rejected data sampled from the original model as "rejected". For "chosen", we use data from the original finetuning dataset as "chosen". This was done on a limited portion of of primarily instruction following data; we plan to scale up a larger KTO dataset in the future for better generalization. This is the 5th in a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet and Opus. This model is fine-tuned on top of anthracite-org/magnum-12b-v2.

Repository: localaiLicense: apache-2.0

mn-12b-mag-mell-r1-iq-arm-imatrix
This is a merge of pre-trained language models created using mergekit. Mag Mell is a multi-stage merge, Inspired by hyper-merges like Tiefighter and Umbral Mind. Intended to be a general purpose "Best of Nemo" model for any fictional, creative use case. 6 models were chosen based on 3 categories; they were then paired up and merged via layer-weighted SLERP to create intermediate "specialists" which are then evaluated in their domain. The specialists were then merged into the base via DARE-TIES, with hyperparameters chosen to reduce interference caused by the overlap of the three domains. The idea with this approach is to extract the best qualities of each component part, and produce models whose task vectors represent more than the sum of their parts. The three specialists are as follows: Hero (RP, kink/trope coverage): Chronos Gold, Sunrose. Monk (Intelligence, groundedness): Bophades, Wissenschaft. Deity (Prose, flair): Gutenberg v4, Magnum 2.5 KTO. I've been dreaming about this merge since Nemo tunes started coming out in earnest. From our testing, Mag Mell demonstrates worldbuilding capabilities unlike any model in its class, comparable to old adventuring models like Tiefighter, and prose that exhibits minimal "slop" (not bad for no finetuning,) frequently devising electrifying metaphors that left us consistently astonished. I don't want to toot my own bugle though; I'm really proud of how this came out, but please leave your feedback, good or bad.Special thanks as usual to Toaster for his feedback and Fizz for helping fund compute, as well as the KoboldAI Discord for their resources. The following models were included in the merge: IntervitensInc/Mistral-Nemo-Base-2407-chatml nbeerbower/mistral-nemo-bophades-12B nbeerbower/mistral-nemo-wissenschaft-12B elinas/Chronos-Gold-12B-1.0 Fizzarolli/MN-12b-Sunrose nbeerbower/mistral-nemo-gutenberg-12B-v4 anthracite-org/magnum-12b-v2.5-kto

Repository: localaiLicense: unlicense

rei-v3-kto-12b
Taking the previous 12B trained with Subseqence Loss - This model is meant to refine the base's sharp edges and increase coherency, intelligence and prose while replicating the prose of the Claude models Opus and Sonnet Fine-tuned on top of Rei-V3-12B-Base, Rei-12B is designed to replicate the prose quality of Claude 3 models, particularly Sonnet and Opus, using a prototype Magnum V5 datamix.

Repository: localaiLicense: apache-2.0

localvqe-v1.1-1.3m
LocalVQE v1.1 (1.3 M parameters, F32) — joint acoustic echo cancellation, noise suppression, and dereverberation for 16 kHz mono speech. DeepVQE-style architecture with an S4D bottleneck and an in-graph DCT-II filterbank. ~9.6× realtime on a desktop CPU; 16 ms algorithmic latency. ~5 MB on disk. v1.1 ships the v16 echoaware checkpoint with improved double-talk and near-end single-talk AECMOS scores.

Repository: localaiLicense: apache-2.0