LocalAI - Models

llama-3.2-1b-instruct:q4_k_m

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Links

https://huggingface.co/hugging-quants/Llama-3.2-1B-Instruct-Q4_K_M-GGUF

Tags

llama-3.2-3b-instruct:q4_k_m

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Links

https://huggingface.co/hugging-quants/Llama-3.2-3B-Instruct-Q4_K_M-GGUF

Tags

llama-3.2-3b-instruct:q8_0

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Links

https://huggingface.co/hugging-quants/Llama-3.2-3B-Instruct-Q8_0-GGUF

Tags

llama-3.2-1b-instruct:q8_0

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Links

https://huggingface.co/hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF

Tags

versatillama-llama-3.2-3b-instruct-abliterated

Small but Smart Fine-Tuned on Vast dataset of Conversations. Able to Generate Human like text with high performance within its size. It is Very Versatile when compared for it's size and Parameters and offers capability almost as good as Llama 3.1 8B Instruct.

Links

https://huggingface.co/QuantFactory/VersatiLlama-Llama-3.2-3B-Instruct-Abliterated-GGUF

Tags

llama3.2-3b-enigma

Enigma is a code-instruct model built on Llama 3.2 3b. It is a high quality code instruct model with the Llama 3.2 Instruct chat format. The model is finetuned on synthetic code-instruct data generated with Llama 3.1 405b and supplemented with generalist synthetic data. It uses the Llama 3.2 Instruct prompt format.

Links

https://huggingface.co/QuantFactory/Llama3.2-3B-Enigma-GGUF

Tags

llama3.2-3b-esper2

Esper 2 is a DevOps and cloud architecture code specialist built on Llama 3.2 3b. It is an AI assistant focused on AWS, Azure, GCP, Terraform, Dockerfiles, pipelines, shell scripts and more, with real world problem solving and high quality code instruct performance within the Llama 3.2 Instruct chat format. Finetuned on synthetic DevOps-instruct and code-instruct data generated with Llama 3.1 405b and supplemented with generalist chat data.

Links

https://huggingface.co/QuantFactory/Llama3.2-3B-Esper2-GGUF

Tags

llama-3.2-3b-agent007

The model is a quantized version of EpistemeAI/Llama-3.2-3B-Agent007, developed by EpistemeAI and fine-tuned from unsloth/llama-3.2-3b-instruct-bnb-4bit. It was trained 2x faster with Unsloth and Huggingface's TRL library. Fine tuned with Agent datasets.

Links

https://huggingface.co/QuantFactory/Llama-3.2-3B-Agent007-GGUF

Tags

llama-3.2-3b-agent007-coder

The Llama-3.2-3B-Agent007-Coder-GGUF is a quantized version of the EpistemeAI/Llama-3.2-3B-Agent007-Coder model, which is a fine-tuned version of the unsloth/llama-3.2-3b-instruct-bnb-4bit model. It is created using llama.cpp and trained with additional datasets such as the Agent dataset, Code Alpaca 20K, and magpie ultra 0.1. This model is optimized for multilingual dialogue use cases and agentic retrieval and summarization tasks. The model is available for commercial and research use in multiple languages and is best used with the transformers library.

Links

https://huggingface.co/QuantFactory/Llama-3.2-3B-Agent007-Coder-GGUF

Tags

fireball-meta-llama-3.2-8b-instruct-agent-003-128k-code-dpo

The LLM model is a quantized version of EpistemeAI/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO, which is an experimental and revolutionary fine-tune with DPO dataset to allow LLama 3.1 8B to be an agentic coder. It has some built-in agent features such as search, calculator, and ReAct. Other noticeable features include self-learning using unsloth, RAG applications, and memory. The context window of the model is 128K. It can be integrated into projects using popular libraries like Transformers and vLLM. The model is suitable for use with Langchain or LLamaIndex. The model is developed by EpistemeAI and licensed under the Apache 2.0 license.

Links

https://huggingface.co/QuantFactory/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO-GGUF

Tags

llama-3.2-chibi-3b

Small parameter LLMs are ideal for navigating the complexities of the Japanese language, which involves multiple character systems like kanji, hiragana, and katakana, along with subtle social cues. Despite their smaller size, these models are capable of delivering highly accurate and context-aware results, making them perfect for use in environments where resources are constrained. Whether deployed on mobile devices with limited processing power or in edge computing scenarios where fast, real-time responses are needed, these models strike the perfect balance between performance and efficiency, without sacrificing quality or speed.

Links

Tags

llama-3.2-3b-reasoning-time

Lyte/Llama-3.2-3B-Reasoning-Time is a large language model with 3.2 billion parameters, designed for reasoning and time-based tasks in English. It is based on the Llama architecture and has been quantized using the GGUF format by mradermacher.

Links

https://huggingface.co/mradermacher/Llama-3.2-3B-Reasoning-Time-GGUF

Tags

llama-3.2-sun-2.5b-chat

Base Model Llama 3.2 1B Extended Size 1B to 2.5B parameters Extension Method Proprietary technique developed by MedIT Solutions Fine-tuning Open (or open subsets allowing for commercial use) open datasets from HF Open (or open subsets allowing for commercial use) SFT datasets from HF Training Status Current version: chat-1.0.0 Key Features Built on Llama 3.2 architecture Expanded from 1B to 2.47B parameters Optimized for open-ended conversations Incorporates supervised fine-tuning for improved performance Use Case General conversation and task-oriented interactions

Links

Tags

llama-3.2-3b-instruct-uncensored

This is an uncensored version of the original Llama-3.2-3B-Instruct, created using mlabonne's script, which builds on FailSpy's notebook and the original work from Andy Arditi et al..

Links

Tags

llama3.2-3b-enigma

Enigma is a code-instruct model built on Llama 3.2 3b. It is a high quality code instruct model with the Llama 3.2 Instruct chat format. The model is finetuned on synthetic code-instruct data generated with Llama 3.1 405b and supplemented with generalist synthetic data. It uses the Llama 3.2 Instruct prompt format.

Links

https://huggingface.co/QuantFactory/Llama3.2-3B-Enigma-GGUF

Tags

llama3.2-3b-shiningvaliant2-i1

Shining Valiant 2 is a chat model built on Llama 3.2 3b, finetuned on our data for friendship, insight, knowledge and enthusiasm. Finetuned on meta-llama/Llama-3.2-3B-Instruct for best available general performance Trained on a variety of high quality data; focused on science, engineering, technical knowledge, and structured reasoning Also available for Llama 3.1 70b and Llama 3.1 8b! Version This is the 2024-09-27 release of Shining Valiant 2 for Llama 3.2 3b.

Links

Tags

onellm-doey-v1-llama-3.2-3b

This model is a fine-tuned version of LLaMA 3.2-3B, optimized using LoRA (Low-Rank Adaptation) on the NVIDIA ChatQA-Training-Data. It is tailored for conversational AI, question answering, and other instruction-following tasks, with support for sequences up to 1024 tokens.

Links

Tags

fusechat-llama-3.2-3b-instruct

We present FuseChat-3.0, a series of models crafted to enhance performance by integrating the strengths of multiple source LLMs into more compact target LLMs. To achieve this fusion, we utilized four powerful source LLMs: Gemma-2-27B-It, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, and Llama-3.1-70B-Instruct. For the target LLMs, we employed three widely-used smaller models—Llama-3.1-8B-Instruct, Gemma-2-9B-It, and Qwen-2.5-7B-Instruct—along with two even more compact models—Llama-3.2-3B-Instruct and Llama-3.2-1B-Instruct. The implicit model fusion process involves a two-stage training pipeline comprising Supervised Fine-Tuning (SFT) to mitigate distribution discrepancies between target and source LLMs, and Direct Preference Optimization (DPO) for learning preferences from multiple source LLMs. The resulting FuseChat-3.0 models demonstrated substantial improvements in tasks related to general conversation, instruction following, mathematics, and coding. Notably, when Llama-3.1-8B-Instruct served as the target LLM, our fusion approach achieved an average improvement of 6.8 points across 14 benchmarks. Moreover, it showed significant improvements of 37.1 and 30.1 points on instruction-following test sets AlpacaEval-2 and Arena-Hard respectively. We have released the FuseChat-3.0 models on Huggingface, stay tuned for the forthcoming dataset and code.

Links

Tags

fastllama-3.2-1b-instruct

FastLlama is a highly optimized version of the Llama-3.2-1B-Instruct model. Designed for superior performance in constrained environments, it combines speed, compactness, and high accuracy. This version has been fine-tuned using the MetaMathQA-50k section of the HuggingFaceTB/smoltalk dataset to enhance its mathematical reasoning and problem-solving abilities.

Links

Tags

minithinky-v2-1b-llama-3.2

This is the newer checkpoint of MiniThinky-1B-Llama-3.2 (version 1), which the loss decreased from 0.7 to 0.5

Links

Tags

menlo_rezero-v0.1-llama-3.2-3b-it-grpo-250404

ReZero trains a small language model to develop effective search behaviors instead of memorizing static data. It interacts with multiple synthetic search engines, each with unique retrieval mechanisms, to refine queries and persist in searching until it finds exact answers. The project focuses on reinforcement learning, preventing overfitting, and optimizing for efficiency in real-world search applications.

Links

Tags

Model Gallery

Filter by type:

Filter by tags:

llama-3.2-1b-instruct:q4_k_m

llama-3.2-3b-instruct:q4_k_m

llama-3.2-3b-instruct:q8_0

llama-3.2-1b-instruct:q8_0

versatillama-llama-3.2-3b-instruct-abliterated

llama3.2-3b-enigma

llama3.2-3b-esper2

llama-3.2-3b-agent007

llama-3.2-3b-agent007-coder

fireball-meta-llama-3.2-8b-instruct-agent-003-128k-code-dpo

llama-3.2-chibi-3b

llama-3.2-3b-reasoning-time

llama-3.2-sun-2.5b-chat

llama-3.2-3b-instruct-uncensored

llama3.2-3b-enigma

llama3.2-3b-shiningvaliant2-i1

onellm-doey-v1-llama-3.2-3b

fusechat-llama-3.2-3b-instruct

fastllama-3.2-1b-instruct

minithinky-v2-1b-llama-3.2

menlo_rezero-v0.1-llama-3.2-3b-it-grpo-250404