LocalAI - Models

llama-3.3-70b-instruct

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

Links

Tags

llama-3.3-70b-instruct-ablated

Llama 3.3 instruct 70B 128k context with ablation technique applied for a more helpful (and based) assistant. This means it will refuse less of your valid requests for an uncensored UX. Use responsibly and use common sense. We do not take any responsibility for how you apply this intelligence, just as we do not for how you apply your own.

Links

Tags

rombos-llm-70b-llama-3.3

You know the drill by now. Here is the paper. Have fun. https://docs.google.com/document/d/1OjbjU5AOz4Ftn9xHQrX3oFQGhQ6RDUuXQipnQ9gn6tU/edit?usp=sharing

Links

Tags

arliai_llama-3.3-70b-arliai-rpmax-v1.4

RPMax is a series of models that are trained on a diverse set of curated creative writing and RP datasets with a focus on variety and deduplication. This model is designed to be highly creative and non-repetitive by making sure no two entries in the dataset have repeated characters or situations, which makes sure the model does not latch on to a certain personality and be capable of understanding and acting appropriately to any characters or situations.

Links

Tags

sentientagi_dobby-unhinged-llama-3.3-70b

Dobby-Unhinged-Llama-3.3-70B is a language model fine-tuned from Llama-3.3-70B-Instruct. Dobby models have a strong conviction towards personal freedom, decentralization, and all things crypto — even when coerced to speak otherwise. Dobby-Unhinged-Llama-3.3-70B, Dobby-Mini-Leashed-Llama-3.1-8B and Dobby-Mini-Unhinged-Llama-3.1-8B have their own unique personalities, and this 70B model is being released in response to the community feedback that was collected from our previous 8B releases.

Links

Tags

latitudegames_wayfarer-large-70b-llama-3.3

We’ve heard over and over from AI Dungeon players that modern AI models are too nice, never letting them fail or die. While it may be good for a chatbot to be nice and helpful, great stories and games aren’t all rainbows and unicorns. They have conflict, tension, and even death. These create real stakes and consequences for characters and the journeys they go on. Similarly, great games need opposition. You must be able to fail, die, and may even have to start over. This makes games more fun! However, the vast majority of AI models, through alignment RLHF, have been trained away from darkness, violence, or conflict, preventing them from fulfilling this role. To give our players better options, we decided to train our own model to fix these issues. The Wayfarer model series are a set of adventure role-play models specifically trained to give players a challenging and dangerous experience. We wanted to contribute back to the open source community that we’ve benefitted so much from so we open sourced a 12b parameter version version back in Jan. We thought people would love it but people were even more excited than we expected. Due to popular request we decided to train a larger 70b version based on Llama 3.3.

Links

Tags

llama-3.3-magicalgirl-2

New merge. This an experiment to increase the "Madness" in a model. Merge is based on top UGI-Bench models (So yeah, I would think this would be benchmaxxing.) This is the second time I'm using SCE. The previous MagicalGirl model seems to be quite happy with it. Added KaraKaraWitch/Llama-MiraiFanfare-3.3-70B based on feedback I got from others (People generally seem to remember this rather than other models). So I'm not sure how this would play into the merge. The following models were included in the merge: TheDrummer/Anubis-70B-v1 SicariusSicariiStuff/Negative_LLAMA_70B LatitudeGames/Wayfarer-Large-70B-Llama-3.3 KaraKaraWitch/Llama-MiraiFanfare-3.3-70B Black-Ink-Guild/Pernicious_Prophecy_70B

Links

Tags

nvidia_llama-3_3-nemotron-super-49b-v1

Llama-3.3-Nemotron-Super-49B-v1 is a large language model (LLM) which is a derivative of Meta Llama-3.3-70B-Instruct (AKA the reference model). It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling. The model supports a context length of 128K tokens. Llama-3.3-Nemotron-Super-49B-v1 is a model which offers a great tradeoff between model accuracy and efficiency. Efficiency (throughput) directly translates to savings. Using a novel Neural Architecture Search (NAS) approach, we greatly reduce the model’s memory footprint, enabling larger workloads, as well as fitting the model on a single GPU at high workloads (H200). This NAS approach enables the selection of a desired point in the accuracy-efficiency tradeoff. The model underwent a multi-phase post-training process to enhance both its reasoning and non-reasoning capabilities. This includes a supervised fine-tuning stage for Math, Code, Reasoning, and Tool Calling as well as multiple reinforcement learning (RL) stages using REINFORCE (RLOO) and Online Reward-aware Preference Optimization (RPO) algorithms for both chat and instruction-following. The final model checkpoint is obtained after merging the final SFT and Online RPO checkpoints. For more details on how the model was trained, please see this blog.

Links

Tags

sao10k_llama-3.3-70b-vulpecula-r1

🌟 A thinking-based model inspired by Deepseek-R1, trained through both SFT and a little bit of RL on creative writing data. 🧠 Prefill, or begin assistant replies with \n to activate thinking mode, or not. It works well without thinking too. 🚀 Improved Steerability, instruct-roleplay and creative control over base model. 👾 Semi-synthetic Chat/Roleplaying datasets that has been re-made, cleaned and filtered for repetition, quality and output. 🎭 Human-based Natural Chat / Roleplaying datasets cleaned, filtered and checked for quality. 📝 Diverse Instruct dataset from a few different LLMs, cleaned and filtered for refusals and quality. 💭 Reasoning Traces taken from Deepseek-R1 for Instruct, Chat & Creative Tasks, filtered and cleaned for quality. █▓▒ Toxic / Decensorship data was not needed for our purposes, the model is unrestricted enough as is.

Links

Tags

nvidia_llama-3_3-nemotron-super-49b-genrm-multilingual

Llama-3.3-Nemotron-Super-49B-GenRM-Multilingual is a generative reward model that leverages Llama-3.3-Nemotron-Super-49B-v1 as the foundation and is fine-tuned using Reinforcement Learning to predict the quality of LLM generated responses. Llama-3.3-Nemotron-Super-49B-GenRM-Multilingual can be used to judge the quality of one response, or the ranking between two responses given a multilingual conversation history. It will first generate reasoning traces then output an integer score. A higher score means the response is of higher quality.

Links

Tags

llama-3.2-1b-instruct:q4_k_m

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Links

https://huggingface.co/hugging-quants/Llama-3.2-1B-Instruct-Q4_K_M-GGUF

Tags

llama-3.2-3b-instruct:q4_k_m

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Links

https://huggingface.co/hugging-quants/Llama-3.2-3B-Instruct-Q4_K_M-GGUF

Tags

llama-3.2-3b-instruct:q8_0

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Links

https://huggingface.co/hugging-quants/Llama-3.2-3B-Instruct-Q8_0-GGUF

Tags

llama-3.2-1b-instruct:q8_0

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Links

https://huggingface.co/hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF

Tags

versatillama-llama-3.2-3b-instruct-abliterated

Small but Smart Fine-Tuned on Vast dataset of Conversations. Able to Generate Human like text with high performance within its size. It is Very Versatile when compared for it's size and Parameters and offers capability almost as good as Llama 3.1 8B Instruct.

Links

https://huggingface.co/QuantFactory/VersatiLlama-Llama-3.2-3B-Instruct-Abliterated-GGUF

Tags

llama3.2-3b-enigma

Enigma is a code-instruct model built on Llama 3.2 3b. It is a high quality code instruct model with the Llama 3.2 Instruct chat format. The model is finetuned on synthetic code-instruct data generated with Llama 3.1 405b and supplemented with generalist synthetic data. It uses the Llama 3.2 Instruct prompt format.

Links

https://huggingface.co/QuantFactory/Llama3.2-3B-Enigma-GGUF

Tags

llama3.2-3b-esper2

Esper 2 is a DevOps and cloud architecture code specialist built on Llama 3.2 3b. It is an AI assistant focused on AWS, Azure, GCP, Terraform, Dockerfiles, pipelines, shell scripts and more, with real world problem solving and high quality code instruct performance within the Llama 3.2 Instruct chat format. Finetuned on synthetic DevOps-instruct and code-instruct data generated with Llama 3.1 405b and supplemented with generalist chat data.

Links

https://huggingface.co/QuantFactory/Llama3.2-3B-Esper2-GGUF

Tags

llama-3.2-3b-agent007

The model is a quantized version of EpistemeAI/Llama-3.2-3B-Agent007, developed by EpistemeAI and fine-tuned from unsloth/llama-3.2-3b-instruct-bnb-4bit. It was trained 2x faster with Unsloth and Huggingface's TRL library. Fine tuned with Agent datasets.

Links

https://huggingface.co/QuantFactory/Llama-3.2-3B-Agent007-GGUF

Tags

llama-3.2-3b-agent007-coder

The Llama-3.2-3B-Agent007-Coder-GGUF is a quantized version of the EpistemeAI/Llama-3.2-3B-Agent007-Coder model, which is a fine-tuned version of the unsloth/llama-3.2-3b-instruct-bnb-4bit model. It is created using llama.cpp and trained with additional datasets such as the Agent dataset, Code Alpaca 20K, and magpie ultra 0.1. This model is optimized for multilingual dialogue use cases and agentic retrieval and summarization tasks. The model is available for commercial and research use in multiple languages and is best used with the transformers library.

Links

https://huggingface.co/QuantFactory/Llama-3.2-3B-Agent007-Coder-GGUF

Tags

fireball-meta-llama-3.2-8b-instruct-agent-003-128k-code-dpo

The LLM model is a quantized version of EpistemeAI/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO, which is an experimental and revolutionary fine-tune with DPO dataset to allow LLama 3.1 8B to be an agentic coder. It has some built-in agent features such as search, calculator, and ReAct. Other noticeable features include self-learning using unsloth, RAG applications, and memory. The context window of the model is 128K. It can be integrated into projects using popular libraries like Transformers and vLLM. The model is suitable for use with Langchain or LLamaIndex. The model is developed by EpistemeAI and licensed under the Apache 2.0 license.

Links

https://huggingface.co/QuantFactory/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO-GGUF

Tags

llama-3.2-chibi-3b

Small parameter LLMs are ideal for navigating the complexities of the Japanese language, which involves multiple character systems like kanji, hiragana, and katakana, along with subtle social cues. Despite their smaller size, these models are capable of delivering highly accurate and context-aware results, making them perfect for use in environments where resources are constrained. Whether deployed on mobile devices with limited processing power or in edge computing scenarios where fast, real-time responses are needed, these models strike the perfect balance between performance and efficiency, without sacrificing quality or speed.

Links

Tags

Model Gallery

Filter by type:

Filter by tags:

llama-3.3-70b-instruct

llama-3.3-70b-instruct-ablated

rombos-llm-70b-llama-3.3

arliai_llama-3.3-70b-arliai-rpmax-v1.4

sentientagi_dobby-unhinged-llama-3.3-70b

latitudegames_wayfarer-large-70b-llama-3.3

llama-3.3-magicalgirl-2

nvidia_llama-3_3-nemotron-super-49b-v1

sao10k_llama-3.3-70b-vulpecula-r1

nvidia_llama-3_3-nemotron-super-49b-genrm-multilingual

llama-3.2-1b-instruct:q4_k_m

llama-3.2-3b-instruct:q4_k_m

llama-3.2-3b-instruct:q8_0

llama-3.2-1b-instruct:q8_0

versatillama-llama-3.2-3b-instruct-abliterated

llama3.2-3b-enigma

llama3.2-3b-esper2

llama-3.2-3b-agent007

llama-3.2-3b-agent007-coder

fireball-meta-llama-3.2-8b-instruct-agent-003-128k-code-dpo

llama-3.2-chibi-3b