LocalAI - Models

menlo_jan-nano-128k

Jan-Nano-128k represents a significant advancement in compact language models for research applications. Building upon the success of Jan-Nano, this enhanced version features a native 128k context window that enables deeper, more comprehensive research capabilities without the performance degradation typically associated with context extension methods. Key Improvements: 🔍 Research Deeper: Extended context allows for processing entire research papers, lengthy documents, and complex multi-turn conversations ⚡ Native 128k Window: Built from the ground up to handle long contexts efficiently, maintaining performance across the full context range 📈 Enhanced Performance: Unlike traditional context extension methods, Jan-Nano-128k shows improved performance with longer contexts This model maintains full compatibility with Model Context Protocol (MCP) servers while dramatically expanding the scope of research tasks it can handle in a single session.

Links

Tags

menlo_lucy-128k

Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing. Built on Qwen3-1.7B, Lucy inherits deep research capabilities from larger models while being optimized to run efficiently on mobile devices, even with CPU-only configurations. We achieved this through machine-generated task vectors that optimize thinking processes, smooth reward functions across multiple categories, and pure reinforcement learning without any supervised fine-tuning.

Links

Tags

gemma-3-4b-it-max-horror-uncensored-dbl-x-imatrix

Google's newest Gemma-3 model that has been uncensored by David_AU (maintains instruction following / model performance and adds 4 layers to the model) and re-enforced with a system prompt (optional) - see below. The "Horror Imatrix" was built using Grand Horror 16B (at my repo). This adds a "tint" of horror to the model. 5 examples provided (NSFW / F-Bombs galore) below with prompts at IQ4XS (56 t/s on mid level card). Context: 128k. "MAXED" This means the embed and output tensor are set at "BF16" (full precision) for all quants. This enhances quality, depth and general performance at the cost of a slightly larger quant. "HORROR IMATRIX" A strong, in house built, imatrix dataset built by David_AU which results in better overall function, instruction following, output quality and stronger connections to ideas, concepts and the world in general. This combines with "MAXing" the quant to improve preformance.

Links

https://huggingface.co/DavidAU/Gemma-3-4b-it-MAX-HORROR-Uncensored-DBL-X-Imatrix-GGUF

Tags

llama-3.3-70b-instruct-ablated

Llama 3.3 instruct 70B 128k context with ablation technique applied for a more helpful (and based) assistant. This means it will refuse less of your valid requests for an uncensored UX. Use responsibly and use common sense. We do not take any responsibility for how you apply this intelligence, just as we do not for how you apply your own.

Links

Tags

deepcogito_cogito-v1-preview-llama-70b

The Cogito LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use. Cogito models are hybrid reasoning models. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models). The LLMs are trained using Iterated Distillation and Amplification (IDA) - an scalable and efficient alignment strategy for superintelligence using iterative self-improvement. The models have been optimized for coding, STEM, instruction following and general helpfulness, and have significantly higher multilingual, coding and tool calling capabilities than size equivalent counterparts. In both standard and reasoning modes, Cogito v1-preview models outperform their size equivalent counterparts on common industry benchmarks. Each model is trained in over 30 languages and supports a context length of 128k.

Links

Tags

fireball-meta-llama-3.2-8b-instruct-agent-003-128k-code-dpo

The LLM model is a quantized version of EpistemeAI/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO, which is an experimental and revolutionary fine-tune with DPO dataset to allow LLama 3.1 8B to be an agentic coder. It has some built-in agent features such as search, calculator, and ReAct. Other noticeable features include self-learning using unsloth, RAG applications, and memory. The context window of the model is 128K. It can be integrated into projects using popular libraries like Transformers and vLLM. The model is suitable for use with Langchain or LLamaIndex. The model is developed by EpistemeAI and licensed under the Apache 2.0 license.

Links

https://huggingface.co/QuantFactory/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO-GGUF

Tags

deepcogito_cogito-v1-preview-llama-3b

The Cogito LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use. Cogito models are hybrid reasoning models. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models). The LLMs are trained using Iterated Distillation and Amplification (IDA) - an scalable and efficient alignment strategy for superintelligence using iterative self-improvement. The models have been optimized for coding, STEM, instruction following and general helpfulness, and have significantly higher multilingual, coding and tool calling capabilities than size equivalent counterparts. In both standard and reasoning modes, Cogito v1-preview models outperform their size equivalent counterparts on common industry benchmarks. Each model is trained in over 30 languages and supports a context length of 128k.

Links

Tags

deepcogito_cogito-v1-preview-llama-8b

The Cogito LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use. Cogito models are hybrid reasoning models. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models). The LLMs are trained using Iterated Distillation and Amplification (IDA) - an scalable and efficient alignment strategy for superintelligence using iterative self-improvement. The models have been optimized for coding, STEM, instruction following and general helpfulness, and have significantly higher multilingual, coding and tool calling capabilities than size equivalent counterparts. In both standard and reasoning modes, Cogito v1-preview models outperform their size equivalent counterparts on common industry benchmarks. Each model is trained in over 30 languages and supports a context length of 128k.

Links

Tags

astrosage-70b

Developed by: AstroMLab (Tijmen de Haan, Yuan-Sen Ting, Tirthankar Ghosal, Tuan Dung Nguyen, Alberto Accomazzi, Emily Herron, Vanessa Lama, Azton Wells, Nesar Ramachandra, Rui Pan) Funded by: Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility at Oak Ridge National Laboratory (U.S. Department of Energy). Microsoft’s Accelerating Foundation Models Research (AFMR) program. World Premier International Research Center Initiative (WPI), MEXT, Japan. National Science Foundation (NSF). UChicago Argonne LLC, Operator of Argonne National Laboratory (U.S. Department of Energy). Reference Paper: Tijmen de Haan et al. (2025). "AstroMLab 4: Benchmark-Topping Performance in Astronomy Q&A with a 70B-Parameter Domain-Specialized Reasoning Model" https://arxiv.org/abs/2505.17592 Model Type: Autoregressive transformer-based LLM, specialized in astronomy, astrophysics, space science, astroparticle physics, cosmology, and astronomical instrumentation. Model Architecture: AstroSage-70B is a fine-tuned derivative of the Meta-Llama-3.1-70B architecture, making no architectural changes. The Llama-3.1-70B-Instruct tokenizer is also used without modification. Context Length: Fine-tuned on 8192-token sequences. Base model was trained to 128k context length. AstroSage-70B is a large-scale, domain-specialized language model tailored for research and education in astronomy, astrophysics, space science, cosmology, and astronomical instrumentation. It builds on the Llama-3.1-70B foundation model, enhanced through extensive continued pre-training (CPT) on a vast corpus of astronomical literature, further refined with supervised fine-tuning (SFT) on instruction-following datasets, and finally enhanced via parameter averaging (model merging) with other popular fine tunes. AstroSage-70B aims to achieve state-of-the-art performance on astronomy-specific tasks, providing researchers, students, and enthusiasts with an advanced AI assistant. This 70B parameter model represents a significant scaling up from the AstroSage-8B model. The primary enhancements from the AstroSage-8B model are: Stronger base model, higher parameter count for increased capacity Improved datasets Improved learning hyperparameters Reasoning capability (can be enabled or disabled at inference time) Training Lineage Base Model: Meta-Llama-3.1-70B. Continued Pre-Training (CPT): The base model underwent 2.5 epochs of CPT (168k GPU-hours) on a specialized astronomy corpus (details below, largely inherited from AstroSage-8B) to produce AstroSage-70B-CPT. This stage imbues domain-specific knowledge and language nuances. Supervised Fine-Tuning (SFT): AstroSage-70B-CPT was then fine-tuned for 0.6 epochs (13k GPU-hours) using astronomy-relevant and general-purpose instruction-following datasets, resulting in AstroSage-70B-SFT. Final Mixture: The released AstroSage-70B model is created via parameter averaging / model merging: DARE-TIES with rescale: true and lambda: 1.2 AstroSage-70B-CPT designated as the "base model" 70% AstroSage-70B-SFT (density 0.7) 15% Llama-3.1-Nemotron-70B-Instruct (density 0.5) 7.5% Llama-3.3-70B-Instruct (density 0.5) 7.5% Llama-3.1-70B-Instruct (density 0.5) Intended Use: Like AstroSage-8B, this model can be used for a variety of LLM application, including Providing factual information and explanations in astronomy, astrophysics, cosmology, and instrumentation. Assisting with literature reviews and summarizing scientific papers. Answering domain-specific questions with high accuracy. Brainstorming research ideas and formulating hypotheses. Assisting with programming tasks related to astronomical data analysis. Serving as an educational tool for learning astronomical concepts. Potentially forming the core of future agentic research assistants capable of more autonomous scientific tasks.

Links

Tags

chronos-gold-12b-1.0

Chronos Gold 12B 1.0 is a very unique model that applies to domain areas such as general chatbot functionatliy, roleplay, and storywriting. The model has been observed to write up to 2250 tokens in a single sequence. The model was trained at a sequence length of 16384 (16k) and will still retain the apparent 128k context length from Mistral-Nemo, though it deteriorates over time like regular Nemo does based on the RULER Test As a result, is recommended to keep your sequence length max at 16384, or you will experience performance degredation. The base model is mistralai/Mistral-Nemo-Base-2407 which was heavily modified to produce a more coherent model, comparable to much larger models. Chronos Gold 12B-1.0 re-creates the uniqueness of the original Chronos with significiantly enhanced prompt adherence (following), coherence, a modern dataset, as well as supporting a majority of "character card" formats in applications like SillyTavern. It went through an iterative and objective merge process as my previous models and was further finetuned on a dataset curated for it. The specifics of the model will not be disclosed at the time due to dataset ownership.

Links

Tags

mistralai_mistral-small-3.1-24b-instruct-2503

Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks. This model is an instruction-finetuned version of: Mistral-Small-3.1-24B-Base-2503. Mistral Small 3.1 can be deployed locally and is exceptionally "knowledge-dense," fitting within a single RTX 4090 or a 32GB RAM MacBook once quantized.

Links

Tags

mistralai_mistral-small-3.1-24b-instruct-2503-multimodal

Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks. This model is an instruction-finetuned version of: Mistral-Small-3.1-24B-Base-2503. Mistral Small 3.1 can be deployed locally and is exceptionally "knowledge-dense," fitting within a single RTX 4090 or a 32GB RAM MacBook once quantized. This gallery entry includes mmproj for multimodality.

Links

Tags

mistralai_devstral-small-2505

Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positionates it as the #1 open source model on this benchmark. It is finetuned from Mistral-Small-3.1, therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from Mistral-Small-3.1 the vision encoder was removed. For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community. Learn more about Devstral in our blog post. Key Features: Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents. lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use. Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. Context Window: A 128k context window. Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size.

Links

Tags

mistralai_magistral-small-2506

Building upon Mistral Small 3.1 (2503), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters. Magistral Small can be deployed locally, fitting within a single RTX 4090 or a 32GB RAM MacBook once quantized. Learn more about Magistral in our blog post. Key Features Reasoning: Capable of long chains of reasoning traces before providing an answer. Multilingual: Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, and Farsi. Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. Context Window: A 128k context window, but performance might degrade past 40k. Hence we recommend setting the maximum model length to 40k.

Links

Tags

mistral-small-3.2-46b-the-brilliant-raconteur-ii-instruct-2506

WARNING: MADNESS - UN HINGED and... NSFW. Vivid prose. INTENSE. Visceral Details. Violence. HORROR. GORE. Swearing. UNCENSORED... humor, romance, fun. Mistral-Small-3.2-46B-The-Brilliant-Raconteur-II-Instruct-2506 This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly. ABOUT: A stronger, more creative Mistral (Mistral-Small-3.2-24B-Instruct-2506) extended to 79 layers, 46B parameters with Brainstorm 40x by DavidAU (details at very bottom of the page). This is version II, which has a jump in detail, and raw emotion relative to version 1. This model pushes Mistral's Instruct 2506 to the limit: Regens will be very different, even with same prompt / settings. Output generation will vary vastly on each generation. Reasoning will be changed, and often shorter. Prose, creativity, word choice, and general "flow" are improved. Several system prompts below help push this model even further. Model is partly de-censored / abliterated. Most Mistrals are more uncensored that most other models too. This model can also be used for coding too; even at low quants. Model can be used for all use cases too. As this is an instruct model, this model thrives on instructions - both in the system prompt and/or the prompt itself. One example below with 3 generations using Q4_K_S. Second example below with 2 generations using Q4_K_S. Quick Details: Model is 128k context, Jinja template (embedded) OR Chatml Template. Reasoning can be turned on/off (see system prompts below) and is OFF by default. Temp range .1 to 1 suggested, with 1-2 for enhanced creative. Above temp 2, is strong but can be very different. Rep pen range: 1 (off) or very light 1.01, 1.02 to 1.05. (model is sensitive to rep pen - this affects reasoning / generation length.) For creative/brainstorming use: suggest 2-5 generations due to variations caused by Brainstorm. Observations: Sometimes using Chatml (or Alpaca / others ) template (VS Jinja) will result in stronger creative generation. Model can be operated with NO system prompt; however a system prompt will enhance generation. Longer prompts, that more detailed, with more instructions will result in much stronger generations. For prose directives: You may need to add directions, because the model may follow your instructions too closely. IE: "use short sentences" vs "use short sentences sparsely". Reasoning (on) can lead to better creative generation, however sometimes generation with reasoning off is better. Rep pen of up to 1.05 may be needed on quants Q2k/q3ks for some prompts to address "low bit" issues. Detailed settings, system prompts, how to and examples below. NOTES: Image generation should also be possible with this model, just like the base model. Brainstorm was not applied to the image generation systems of the model... yet. This is Version II and subject to change / revision. This model is a slightly different version of: https://huggingface.co/DavidAU/Mistral-Small-3.2-46B-The-Brilliant-Raconteur-Instruct-2506

Links

Tags

mistralai_devstral-small-2507

Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positionates it as the #1 open source model on this benchmark. It is finetuned from Mistral-Small-3.1, therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from Mistral-Small-3.1 the vision encoder was removed.

Links

Tags

mistral-community_pixtral-12b

Highlights: - Natively multimodal, trained with interleaved image and text data - Strong performance on multimodal tasks, excels in instruction following - Maintains state-of-the-art performance on text-only benchmarks Architecture: - New 400M parameter vision encoder trained from scratch - 12B parameter multimodal decoder based on Mistral Nemo - Supports variable image sizes and aspect ratios - Supports multiple images in the long context window of 128k tokens

Links

Tags

openvino-phi3

An OpenVINO-optimized version of the Phi-3 Mini instruction-tuned model with 3.8 billion parameters. It supports a 128k context window and is designed for reasoning, coding, and chat tasks in compute-constrained environments.

Links

https://huggingface.co/fakezeta/Phi-3-mini-128k-instruct-ov-int8

Tags

Model Gallery

Filter by type:

Filter by tags:

menlo_jan-nano-128k

menlo_lucy-128k

gemma-3-4b-it-max-horror-uncensored-dbl-x-imatrix

llama-3.3-70b-instruct-ablated

deepcogito_cogito-v1-preview-llama-70b

fireball-meta-llama-3.2-8b-instruct-agent-003-128k-code-dpo

deepcogito_cogito-v1-preview-llama-3b

deepcogito_cogito-v1-preview-llama-8b

astrosage-70b

chronos-gold-12b-1.0

mistralai_mistral-small-3.1-24b-instruct-2503

mistralai_mistral-small-3.1-24b-instruct-2503-multimodal

mistralai_devstral-small-2505

mistralai_magistral-small-2506

mistral-small-3.2-46b-the-brilliant-raconteur-ii-instruct-2506

mistralai_devstral-small-2507

mistral-community_pixtral-12b

openvino-phi3