LocalAI - Models

deepseek-v4-flash

# DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence Technical Report👁️ ## Introduction We present a preview version of **DeepSeek-V4** series, including two strong Mixture-of-Experts (MoE) language models — **DeepSeek-V4-Pro** with 1.6T parameters (49B activated) and **DeepSeek-V4-Flash** with 284B parameters (13B activated) — both supporting a context length of **one million tokens**. DeepSeek-V4 series incorporate several key upgrades in architecture and optimization: 1. **Hybrid Attention Architecture:** We design a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to dramatically improve long-context efficiency. In the 1M-token context setting, DeepSeek-V4-Pro requires only **27% of single-token inference FLOPs** and **10% of KV cache** compared with DeepSeek-V3.2. 2. **Manifold-Constrained Hyper-Connections (mHC):** We incorporate mHC to strengthen conventional residual connections, enhancing stability of signal propagation across layers while preserving model expressivity. 3. **Muon Optimizer:** We employ the Muon optimizer for faster convergence and greater training stability. ...

Links

https://huggingface.co/unsloth/DeepSeek-V4-Flash-GGUF

Tags

arcee-ai_afm-4.5b

AFM-4.5B is a 4.5 billion parameter instruction-tuned model developed by Arcee.ai, designed for enterprise-grade performance across diverse deployment environments from cloud to edge. The base model was trained on a dataset of 8 trillion tokens, comprising 6.5 trillion tokens of general pretraining data followed by 1.5 trillion tokens of midtraining data with enhanced focus on mathematical reasoning and code generation. Following pretraining, the model underwent supervised fine-tuning on high-quality instruction datasets. The instruction-tuned model was further refined through reinforcement learning on verifiable rewards as well as for human preference. We use a modified version of TorchTitan for pretraining, Axolotl for supervised fine-tuning, and a modified version of Verifiers for reinforcement learning. The development of AFM-4.5B prioritized data quality as a fundamental requirement for achieving robust model performance. We collaborated with DatologyAI, a company specializing in large-scale data curation. DatologyAI's curation pipeline integrates a suite of proprietary algorithms—model-based quality filtering, embedding-based curation, target distribution-matching, source mixing, and synthetic data. Their expertise enabled the creation of a curated dataset tailored to support strong real-world performance. The model architecture follows a standard transformer decoder-only design based on Vaswani et al., incorporating several key modifications for enhanced performance and efficiency. Notable architectural features include grouped query attention for improved inference efficiency and ReLU^2 activation functions instead of SwiGLU to enable sparsification while maintaining or exceeding performance benchmarks. The model available in this repo is the instruct model following supervised fine-tuning and reinforcement learning.

Links

Tags

boomerang-qwen3-2.3b

Boomerang distillation is a phenomenon in LLMs where we can distill a teacher model into a student and reincorporate teacher layers to create intermediate-sized models with no additional training. This is the student model distilled from Qwen3-4B-Base from our paper. This model was initialized from Qwen3-4B-Base by copying every other layer and the last 2 layers. It was distilled on 2.1B tokens of The Pile deduplicated with cross entropy, KL, and cosine loss to match the activations of Qwen3-4B-Base.

Links

Tags

boomerang-qwen3-4.9b

Boomerang distillation is a phenomenon in LLMs where we can distill a teacher model into a student and reincorporate teacher layers to create intermediate-sized models with no additional training. This is the student model distilled from Qwen3-8B-Base from our paper. This model was initialized from Qwen3-8B-Base by copying every other layer and the last 2 layers. It was distilled on 2.1B tokens of The Pile deduplicated with cross entropy, KL, and cosine loss to match the activations of Qwen3-8B-Base.

Links

Tags

l3.3-nevoria-r1-70b

This model builds upon the original Nevoria foundation, incorporating the Deepseek-R1 reasoning architecture to enhance dialogue interaction and scene comprehension. While maintaining Nevoria's core strengths in storytelling and scene description (derived from EVA, EURYALE, and Anubis), this iteration aims to improve prompt adherence and creative reasoning capabilities. The model also retains the balanced perspective introduced by Negative_LLAMA and Nemotron elements. Also, the model plays the card to almost a fault, It'll pick up on minor issues and attempt to run with them. Users had it call them out for misspelling a word while playing in character. Note: While Nevoria-R1 represents a significant architectural change, rather than a direct successor to Nevoria, it operates as a distinct model with its own characteristics. The lorablated model base choice was intentional, creating unique weight interactions similar to the original Astoria model and Astoria V2 model. This "weight twisting" effect, achieved by subtracting the lorablated base model during merging, creates an interesting balance in the model's behavior. While unconventional compared to sequential component application, this approach was chosen for its unique response characteristics.

Links

Tags

llama-3.2-sun-2.5b-chat

Base Model Llama 3.2 1B Extended Size 1B to 2.5B parameters Extension Method Proprietary technique developed by MedIT Solutions Fine-tuning Open (or open subsets allowing for commercial use) open datasets from HF Open (or open subsets allowing for commercial use) SFT datasets from HF Training Status Current version: chat-1.0.0 Key Features Built on Llama 3.2 architecture Expanded from 1B to 2.47B parameters Optimized for open-ended conversations Incorporates supervised fine-tuning for improved performance Use Case General conversation and task-oriented interactions

Links

Tags

calme-2.3-legalkit-8b-i1

This model is an advanced iteration of the powerful meta-llama/Meta-Llama-3.1-8B-Instruct, specifically fine-tuned to enhance its capabilities in the legal domain. The fine-tuning process utilized a synthetically generated dataset derived from the French LegalKit, a comprehensive legal language resource. To create this specialized dataset, I used the NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO model in conjunction with Hugging Face's Inference Endpoint. This approach allowed for the generation of high-quality, synthetic data that incorporates Chain of Thought (CoT) and advanced reasoning in its responses. The resulting model combines the robust foundation of Llama-3.1-8B with tailored legal knowledge and enhanced reasoning capabilities. This makes it particularly well-suited for tasks requiring in-depth legal analysis, interpretation, and application of French legal concepts.

Links

Tags

fireball-llama-3.11-8b-v1orpo

Developed by: EpistemeAI License: apache-2.0 Finetuned from model : unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit Finetuned methods: DPO (Direct Preference Optimization) & ORPO (Odds Ratio Preference Optimization)

Links

https://huggingface.co/mradermacher/Fireball-Llama-3.11-8B-v1orpo-GGUF

Tags

llama-3.1-hawkish-8b

Model has been further finetuned on a set of newly generated 50m high quality tokens related to Financial topics covering topics such as Economics, Fixed Income, Equities, Corporate Financing, Derivatives and Portfolio Management. Data was gathered from publicly available sources and went through several stages of curation into instruction data from the initial amount of 250m+ tokens. To aid in mitigating forgetting information from the original finetune, the data was mixed with instruction sets on the topics of Coding, General Knowledge, NLP and Conversational Dialogue. The model has shown to improve over a number of benchmarks over the original model, notably in Math and Economics. This model represents the first time a 8B model has been able to convincingly get a passing score on the CFA Level 1 exam, requiring a typical 300 hours of studying, indicating a significant improvement in Financial Knowledge.

Links

Tags

skywork-o1-open-llama-3.1-8b

We are excited to announce the release of the Skywork o1 Open model series, developed by the Skywork team at Kunlun Inc. This groundbreaking release introduces a series of models that incorporate o1-like slow thinking and reasoning capabilities. The Skywork o1 Open model series includes three advanced models: Skywork o1 Open-Llama-3.1-8B: A robust chat model trained on Llama-3.1-8B, enhanced significantly with "o1-style" data to improve reasoning skills. Skywork o1 Open-PRM-Qwen-2.5-1.5B: A specialized model designed to enhance reasoning capability through incremental process rewards, ideal for complex problem solving at a smaller scale. Skywork o1 Open-PRM-Qwen-2.5-7B: Extends the capabilities of the 1.5B model by scaling up to handle more demanding reasoning tasks, pushing the boundaries of AI reasoning. Different from mere reproductions of the OpenAI o1 model, the Skywork o1 Open model series not only exhibits innate thinking, planning, and reflecting capabilities in its outputs, but also shows significant improvements in reasoning skills on standard benchmarks. This series represents a strategic advancement in AI capabilities, moving a previously weaker base model towards the state-of-the-art (SOTA) in reasoning tasks.

Links

Tags

loki-v2.6-8b-1024k

The following models were included in the merge: MrRobotoAI/Epic_Fiction-8b MrRobotoAI/Unaligned-RP-Base-8b-1024k MrRobotoAI/Loki-.Epic_Fiction.-8b Casual-Autopsy/L3-Luna-8B Casual-Autopsy/L3-Super-Nova-RP-8B Casual-Autopsy/L3-Umbral-Mind-RP-v3.0-8B Casual-Autopsy/Halu-L3-Stheno-BlackOasis-8B Undi95/Llama-3-LewdPlay-8B Undi95/Llama-3-LewdPlay-8B-evo Undi95/Llama-3-Unholy-8B ChaoticNeutrals/Hathor_Tahsin-L3-8B-v0.9 ChaoticNeutrals/Hathor_RP-v.01-L3-8B ChaoticNeutrals/Domain-Fusion-L3-8B ChaoticNeutrals/T-900-8B ChaoticNeutrals/Poppy_Porpoise-1.4-L3-8B ChaoticNeutrals/Templar_v1_8B ChaoticNeutrals/Hathor_Respawn-L3-8B-v0.8 ChaoticNeutrals/Sekhmet_Gimmel-L3.1-8B-v0.3 zeroblu3/LewdPoppy-8B-RP tohur/natsumura-storytelling-rp-1.0-llama-3.1-8b jeiku/Chaos_RP_l3_8B tannedbum/L3-Nymeria-Maid-8B Nekochu/Luminia-8B-RP vicgalle/Humanish-Roleplay-Llama-3.1-8B saishf/SOVLish-Maid-L3-8B Dogge/llama-3-8B-instruct-Bluemoon-Freedom-RP MrRobotoAI/Epic_Fiction-8b-v4 maldv/badger-lambda-0-llama-3-8b maldv/llama-3-fantasy-writer-8b maldv/badger-kappa-llama-3-8b maldv/badger-mu-llama-3-8b maldv/badger-lambda-llama-3-8b maldv/badger-iota-llama-3-8b maldv/badger-writer-llama-3-8b Magpie-Align/MagpieLM-8B-Chat-v0.1 nbeerbower/llama-3-gutenberg-8B nothingiisreal/L3-8B-Stheno-Horny-v3.3-32K nbeerbower/llama-3-spicy-abliterated-stella-8B Magpie-Align/MagpieLM-8B-SFT-v0.1 NeverSleep/Llama-3-Lumimaid-8B-v0.1 mlabonne/NeuralDaredevil-8B-abliterated mlabonne/Daredevil-8B-abliterated NeverSleep/Llama-3-Lumimaid-8B-v0.1-OAS nothingiisreal/L3-8B-Instruct-Abliterated-DWP openchat/openchat-3.6-8b-20240522 turboderp/llama3-turbcat-instruct-8b UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3 Undi95/Llama-3-LewdPlay-8B TIGER-Lab/MAmmoTH2-8B-Plus OwenArli/Awanllm-Llama-3-8B-Cumulus-v1.0 refuelai/Llama-3-Refueled SicariusSicariiStuff/LLAMA-3_8B_Unaligned_Alpha NousResearch/Hermes-2-Theta-Llama-3-8B ResplendentAI/Nymph_8B grimjim/Llama-3-Oasis-v1-OAS-8B flammenai/Mahou-1.3b-llama3-8B lemon07r/Llama-3-RedMagic4-8B grimjim/Llama-3.1-SuperNova-Lite-lorabilterated-8B grimjim/Llama-Nephilim-Metamorphosis-v2-8B lemon07r/Lllama-3-RedElixir-8B grimjim/Llama-3-Perky-Pat-Instruct-8B ChaoticNeutrals/Hathor_RP-v.01-L3-8B grimjim/llama-3-Nephilim-v2.1-8B ChaoticNeutrals/Hathor_Respawn-L3-8B-v0.8 migtissera/Llama-3-8B-Synthia-v3.5 Locutusque/Llama-3-Hercules-5.0-8B WhiteRabbitNeo/Llama-3-WhiteRabbitNeo-8B-v2.0 VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct iRyanBell/ARC1-II HPAI-BSC/Llama3-Aloe-8B-Alpha HaitameLaf/Llama-3-8B-StoryGenerator failspy/Meta-Llama-3-8B-Instruct-abliterated-v3 Undi95/Llama-3-Unholy-8B ajibawa-2023/Uncensored-Frank-Llama-3-8B ajibawa-2023/SlimOrca-Llama-3-8B ChaoticNeutrals/Templar_v1_8B aifeifei798/llama3-8B-DarkIdol-2.2-Uncensored-1048K ChaoticNeutrals/Hathor_Tahsin-L3-8B-v0.9 Blackroot/Llama-3-Gamma-Twist FPHam/L3-8B-Everything-COT Blackroot/Llama-3-LongStory ChaoticNeutrals/Sekhmet_Gimmel-L3.1-8B-v0.3 abacusai/Llama-3-Smaug-8B Khetterman/CursedMatrix-8B-v9 ajibawa-2023/Scarlett-Llama-3-8B-v1.0 MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/physics_non_masked MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/electrical_engineering MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/college_chemistry MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/philosophy_non_masked MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/college_physics MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/philosophy MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/formal_logic MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/philosophy_100 MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/conceptual_physics MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/college_computer_science MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/psychology_non_masked MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/psychology MrRobotoAI/Unaligned-RP-Base-8b-1024k + Blackroot/Llama3-RP-Lora MrRobotoAI/Unaligned-RP-Base-8b-1024k + Azazelle/Llama-3-LimaRP-Instruct-LoRA-8B MrRobotoAI/Unaligned-RP-Base-8b-1024k + nothingiisreal/llama3-8B-DWP-lora MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/world_religions MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/high_school_european_history MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/electrical_engineering MrRobotoAI/Unaligned-RP-Base-8b-1024k + Azazelle/Llama-3-8B-Abomination-LORA MrRobotoAI/Unaligned-RP-Base-8b-1024k + Azazelle/Llama-3-LongStory-LORA MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/human_sexuality MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/sociology MrRobotoAI/Unaligned-RP-Base-8b-1024k + ResplendentAI/Theory_of_Mind_Llama3 MrRobotoAI/Unaligned-RP-Base-8b-1024k + Azazelle/Smarts_Llama3 MrRobotoAI/Unaligned-RP-Base-8b-1024k + Azazelle/Llama-3-LongStory-LORA MrRobotoAI/Unaligned-RP-Base-8b-1024k + Azazelle/Nimue-8B MrRobotoAI/Unaligned-RP-Base-8b-1024k + vincentyandex/lora_llama3_chunked_novel_bs128 MrRobotoAI/Unaligned-RP-Base-8b-1024k + ResplendentAI/Aura_Llama3 MrRobotoAI/Unaligned-RP-Base-8b-1024k + Azazelle/L3-Daybreak-8b-lora MrRobotoAI/Unaligned-RP-Base-8b-1024k + ResplendentAI/Luna_Llama3 MrRobotoAI/Unaligned-RP-Base-8b-1024k + nicce/story-mixtral-8x7b-lora MrRobotoAI/Unaligned-RP-Base-8b-1024k + Blackroot/Llama-3-LongStory-LORA MrRobotoAI/Unaligned-RP-Base-8b-1024k + ResplendentAI/NoWarning_Llama3 MrRobotoAI/Unaligned-RP-Base-8b-1024k + ResplendentAI/BlueMoon_Llama3

Links

https://huggingface.co/QuantFactory/Loki-v2.6-8b-1024k-GGUF

Tags

deepseek-r1-distill-llama-8b

DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks. Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing. By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.

Links

Tags

locutusque_thespis-llama-3.1-8b

The Thespis family of language models is designed to enhance roleplaying performance through reasoning inspired by the Theory of Mind. Thespis-Llama-3.1-8B is a fine-tuned version of an abliterated Llama-3.1-8B model, optimized using Group Relative Policy Optimization (GRPO). The model is specifically rewarded for minimizing "slop" and repetition in its outputs, aiming to produce coherent and engaging text that maintains character consistency and avoids low-quality responses. This version represents an initial release; future iterations will incorporate a more rigorous fine-tuning process.

Links

Tags

deepseek-r1-distill-qwen-1.5b

DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks. Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing. By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.

Links

Tags

deepseek-r1-distill-qwen-7b

DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks. Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing. By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.

Links

Tags

deepseek-r1-distill-qwen-14b

DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks. Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing. By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.

Links

Tags

deepseek-r1-distill-qwen-32b

DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks. Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing. By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.

Links

Tags

deepseek-r1-distill-llama-8b

DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks. Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing. By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.

Links

Tags

deepseek-r1-distill-llama-70b

DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks. Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing. By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.

Links

Tags

fuseo1-deepseekr1-qwen2.5-coder-32b-preview-v0.1

FuseO1-Preview is our initial endeavor to enhance the System-II reasoning capabilities of large language models (LLMs) through innovative model fusion techniques. By employing our advanced SCE merging methodologies, we integrate multiple open-source o1-like LLMs into a unified model. Our goal is to incorporate the distinct knowledge and strengths from different reasoning LLMs into a single, unified model with strong System-II reasoning abilities, particularly in mathematics, coding, and science domains.

Links

Tags

fuseo1-deepseekr1-qwen2.5-instruct-32b-preview

FuseO1-Preview is our initial endeavor to enhance the System-II reasoning capabilities of large language models (LLMs) through innovative model fusion techniques. By employing our advanced SCE merging methodologies, we integrate multiple open-source o1-like LLMs into a unified model. Our goal is to incorporate the distinct knowledge and strengths from different reasoning LLMs into a single, unified model with strong System-II reasoning abilities, particularly in mathematics, coding, and science domains.

Links

Tags

Model Gallery

Filter by type:

Filter by tags:

deepseek-v4-flash

arcee-ai_afm-4.5b

boomerang-qwen3-2.3b

boomerang-qwen3-4.9b

l3.3-nevoria-r1-70b

llama-3.2-sun-2.5b-chat

calme-2.3-legalkit-8b-i1

fireball-llama-3.11-8b-v1orpo

llama-3.1-hawkish-8b

skywork-o1-open-llama-3.1-8b

loki-v2.6-8b-1024k

deepseek-r1-distill-llama-8b

locutusque_thespis-llama-3.1-8b

deepseek-r1-distill-qwen-1.5b

deepseek-r1-distill-qwen-7b

deepseek-r1-distill-qwen-14b

deepseek-r1-distill-qwen-32b

deepseek-r1-distill-llama-8b

deepseek-r1-distill-llama-70b

fuseo1-deepseekr1-qwen2.5-coder-32b-preview-v0.1

fuseo1-deepseekr1-qwen2.5-instruct-32b-preview