LocalAI - Models

qwen3-vl-embedding-8b

**Model Name:** Qwen3-VL-Embedding-8B **Base Model:** Qwen/Qwen3-VL-8B-Instruct **Description:** The **Qwen3-VL-Embedding** and **Qwen3-VL-Reranker** model series are the latest additions to the Qwen family, built upon the recently open-sourced and powerful Qwen3-VL foundation model. Specifically designed for multimodal information retrieval and cross-modal understanding, this suite accepts diverse inputs including text, images, screenshots, and videos, as well as inputs containing a mixture of these modalities. **Key Features:** - Model Type: MultiModal Embedding - Supported Languages: 30+ Languages - Supported Input Modalities: Text, images, screenshots, videos, and arbitrary multimodal combinations (e.g., text + image, text + video) - Number of Parameters: 8B - Context Length: 32k - Embedding Dimension: Up to 4096, supports user-defined output dimensions ranging from 64 to 4096 **Downloads:** - [GGUF Files](https://huggingface.co/Qwen/Qwen3-VL-Embedding-8B) (e.g., `Qwen3-VL-Embedding-8B-Q8_0.gguf`). **Usage:** - Requires `transformers`, `qwen-vl-utils`, and `torch`. - Example: `from scripts.qwen3_vl_embedding import Qwen3VLEmbedder model = Qwen3VLEmbedder(...)` **Citation:** @article{qwen3vlembedding, ...} This description emphasizes its capabilities, efficiency, and versatility for multimodal search tasks.

Links

Tags

qwen3-vl-embedding-2b

**Model Name:** Qwen3-VL-Embedding-2B **Base Model:** Qwen/Qwen3-VL-2B-Instruct **Description:** The **Qwen3-VL-Embedding** and **Qwen3-VL-Reranker** model series are the latest additions to the Qwen family, built upon the recently open-sourced and powerful Qwen3-VL foundation model. Specifically designed for multimodal information retrieval and cross-modal understanding, this suite accepts diverse inputs including text, images, screenshots, and videos, as well as inputs containing a mixture of these modalities. **Key Features:** - Model Type: MultiModal Embedding - Supported Languages: 30+ Languages - Supported Input Modalities: Text, images, screenshots, videos, and arbitrary multimodal combinations (e.g., text + image, text + video) - Number of Parameters: 2B - Context Length: 32k - Embedding Dimension: Up to 2048, supports user-defined output dimensions ranging from 64 to 2048 **Downloads:** - [GGUF Files](https://huggingface.co/Qwen/Qwen3-VL-Embedding-2B) (e.g., `Qwen3-VL-Embedding-2B-Q8_0.gguf`). **Usage:** - Requires `transformers`, `qwen-vl-utils`, and `torch`. - Example: `from scripts.qwen3_vl_embedding import Qwen3VLEmbedder model = Qwen3VLEmbedder(...)` **Citation:** @article{qwen3vlembedding, ...} This description emphasizes its capabilities, efficiency, and versatility for multimodal search tasks.

Links

Tags

qwen3-vl-reranker-8b

**Model Name:** Qwen3-VL-Reranker-8B **Base Model:** Qwen/Qwen3-VL-Reranker-8B **Description:** A high-performance multimodal reranking model for state-of-the-art cross-modal search. It supports 30+ languages and handles text, images, screenshots, videos, and mixed modalities. With 8B parameters and a 32K context length, it refines retrieval results by combining embedding vectors with precise relevance scores. Optimized for efficiency, it supports quantized versions (e.g., Q8_0, Q4_K_M) and is ideal for applications requiring accurate multimodal content matching. **Key Features:** - **Multimodal**: Text, images, videos, and mixed content. - **Language Support**: 30+ languages. - **Quantization**: Available in Q8_0 (best quality), Q4_K_M (fast, recommended), and lower-precision options. - **Performance**: Outperforms base models in retrieval tasks (e.g., JinaVDR, ViDoRe v3). - **Use Case**: Enhances search pipelines by refining embeddings with precise relevance scores. **Downloads:** - [GGUF Files](https://huggingface.co/mradermacher/Qwen3-VL-Reranker-8B-GGUF) (e.g., `Qwen3-VL-Reranker-8B.Q8_0.gguf`). **Usage:** - Requires `transformers`, `qwen-vl-utils`, and `torch`. - Example: `from scripts.qwen3_vl_reranker import Qwen3VLReranker; model = Qwen3VLReranker(...)` **Citation:** @article{qwen3vlembedding, ...} This description emphasizes its capabilities, efficiency, and versatility for multimodal search tasks.

Links

https://huggingface.co/mradermacher/Qwen3-VL-Reranker-8B-GGUF

Tags

qwen3-vl-reranker-2b-i1

**Model Name:** Qwen3-VL-Reranker-2B-i1 **Base Model:** Qwen/Qwen3-VL-Reranker-2B **Description:** A high-performance multimodal reranking model for state-of-the-art cross-modal search. It supports 30+ languages and handles text, images, screenshots, videos, and mixed modalities. With 8B parameters and a 32K context length, it refines retrieval results by combining embedding vectors with precise relevance scores. Optimized for efficiency, it supports quantized versions (e.g., Q8_0, Q4_K_M) and is ideal for applications requiring accurate multimodal content matching. **Key Features:** - **Multimodal**: Text, images, videos, and mixed content. - **Language Support**: 30+ languages. - **Quantization**: Available in Q8_0 (best quality), Q4_K_M (fast, recommended), and lower-precision options. - **Performance**: Outperforms base models in retrieval tasks (e.g., JinaVDR, ViDoRe v3). - **Use Case**: Enhances search pipelines by refining embeddings with precise relevance scores. **Downloads:** - [GGUF Files](https://huggingface.co/mradermacher/Qwen3-VL-Reranker-2B-i1-GGUF) (e.g., `Qwen3-VL-Reranker-2B.i1-Q4_K_M.gguf`). **Usage:** - Requires `transformers`, `qwen-vl-utils`, and `torch`. - Example: `from scripts.qwen3_vl_reranker import Qwen3VLReranker; model = Qwen3VLReranker(...)` **Citation:** @article{qwen3vlembedding, ...} This description emphasizes its capabilities, efficiency, and versatility for multimodal search tasks.

Links

https://huggingface.co/mradermacher/Qwen3-VL-Reranker-2B-i1-GGUF

Tags

huihui-ai_huihui-gpt-oss-20b-bf16-abliterated

This is an uncensored version of unsloth/gpt-oss-20b-BF16 created with abliteration (see remove-refusals-with-transformers to know more about it).

Links

Tags

dream-org_dream-v0-instruct-7b

This is the instruct model of Dream 7B, which is an open diffusion large language model with top-tier performance.

Links

Tags

huihui-ai_qwen3-14b-abliterated

This is an uncensored version of Qwen/Qwen3-14B created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens. Ablation was performed using a new and faster method, which yields better results.

Links

Tags

huihui-jan-nano-abliterated

This is an uncensored version of Menlo/Jan-nano created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens. Ablation was performed using a new and faster method, which yields better results.

Links

Tags

huihui-ai_gemma-3-1b-it-abliterated

This is an uncensored version of google/gemma-3-1b-it created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens

Links

Tags

huihui-ai_huihui-gemma-3n-e4b-it-abliterated

This is an uncensored version of google/gemma-3n-E4B-it created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens. It was only the text part that was processed, not the image part. After abliterated, it seems like more output content has been opened from a magic box.

Links

Tags

falcon3-1b-instruct-abliterated

This is an uncensored version of tiiuae/Falcon3-1B-Instruct created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.

Links

Tags

falcon3-3b-instruct-abliterated

This is an uncensored version of tiiuae/Falcon3-3B-Instruct created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.

Links

Tags

falcon3-10b-instruct-abliterated

This is an uncensored version of tiiuae/Falcon3-10B-Instruct created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.

Links

Tags

falcon3-7b-instruct-abliterated

This is an uncensored version of tiiuae/Falcon3-7B-Instruct created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.

Links

Tags

llama-3.2-3b-agent007-coder

The Llama-3.2-3B-Agent007-Coder-GGUF is a quantized version of the EpistemeAI/Llama-3.2-3B-Agent007-Coder model, which is a fine-tuned version of the unsloth/llama-3.2-3b-instruct-bnb-4bit model. It is created using llama.cpp and trained with additional datasets such as the Agent dataset, Code Alpaca 20K, and magpie ultra 0.1. This model is optimized for multilingual dialogue use cases and agentic retrieval and summarization tasks. The model is available for commercial and research use in multiple languages and is best used with the transformers library.

Links

https://huggingface.co/QuantFactory/Llama-3.2-3B-Agent007-Coder-GGUF

Tags

smollm-1.7b-instruct

SmolLM is a series of small language models available in three sizes: 135M, 360M, and 1.7B parameters. These models are pre-trained on SmolLM-Corpus, a curated collection of high-quality educational and synthetic data designed for training LLMs. For further details, we refer to our blogpost. To build SmolLM-Instruct, we finetuned the base models on publicly available datasets.

Links

Tags

huihui-ai_deepseek-r1-distill-llama-70b-abliterated

This is an uncensored version of deepseek-ai/DeepSeek-R1-Distill-Llama-70B created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.

Links

Tags

openvino-multilingual-e5-base

Multilingual E5 base embedding model optimized for semantic similarity and retrieval tasks. Supports OpenVINO and ONNX inference formats. Ideal for cross-lingual vector search and semantic matching.

Links

https://huggingface.co/intfloat/multilingual-e5-base

Tags

openvino-all-MiniLM-L6-v2

This sentence-transformers model maps text to 384-dimensional dense vectors for semantic similarity tasks. Based on the MiniLM architecture, it is optimized for OpenVINO inference. Ideal for retrieval-augmented generation (RAG) pipelines.

Links

https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

Tags

aevum-0.6b-finetuned

**Model Name:** Aevum-0.6B-Finetuned **Base Model:** Qwen3-0.6B **Architecture:** Decoder-only Transformer **Parameters:** 0.6 Billion **Task:** Code Generation, Instruction Following **Languages:** English, Python (optimized for code) **License:** Apache 2.0 **Overview:** Aevum-0.6B-Finetuned is a highly efficient, small-scale language model fine-tuned for code generation and task following. Built on the Qwen3-0.6B foundation, it delivers strong performance—achieving a **HumanEval Pass@1 score of 21.34%**—making it the most parameter-efficient sub-1B model in its category. **Key Features:** - Optimized for low-latency inference on CPU and edge devices. - Fine-tuned on MBPP and DeepMind Code Contests for superior code generation accuracy. - Ideal for lightweight development, education, and prototyping. **Use Case:** Perfect for developers and researchers needing a fast, compact, and open model for Python code generation without requiring high-end hardware. **Performance Benchmark:** Outperforms larger models in efficiency: comparable to models 10x its size in task accuracy. **Cite:** @misc{aveum06B2025, title={aevum-0.6B-Finetuned: Lightweight Python Code Generation Model}, author={anonymous}, year={2025}} **Try it:** Use via Hugging Face `transformers` library with minimal setup. 👉 [Model Page on Hugging Face](https://huggingface.co/Aevum-Official/aveum-0.6B-Finetuned)

Links

https://huggingface.co/mradermacher/Aevum-0.6B-Finetuned-GGUF

Tags

gpt-oss-20b-esper3.1-i1

**Model Name:** gpt-oss-20b-Esper3.1 **Repository:** [ValiantLabs/gpt-oss-20b-Esper3.1](https://huggingface.co/ValiantLabs/gpt-oss-20b-Esper3.1) **Base Model:** openai/gpt-oss-20b **Type:** Instruction-tuned, reasoning-focused language model **Size:** 20 billion parameters **License:** Apache 2.0 --- ### 🔍 **Overview** gpt-oss-20b-Esper3.1 is a specialized, instruction-tuned variant of the 20B open-source GPT model, developed by **Valiant Labs**. It excels in **advanced coding, software architecture, and DevOps reasoning**, making it ideal for technical problem-solving and AI-driven engineering tasks. ### ✨ **Key Features** - **Expert in DevOps & Cloud Systems:** Trained on high-difficulty datasets (e.g., Titanium3, Tachibana3, Mitakihara), it delivers precise, actionable guidance for AWS, Kubernetes, Terraform, Ansible, Docker, Jenkins, and more. - **Strong Code Reasoning:** Optimized for complex programming tasks, including full-stack development, scripting, and debugging. - **High-Quality Inference:** Uses `bf16` precision for full-precision performance; quantized versions (e.g., GGUF) available for efficient local inference. - **Open-Source & Free to Use:** Fully open-access, built on the public gpt-oss-20b foundation and trained with community datasets. ### 📌 **Use Cases** - Designing scalable cloud architectures - Writing and optimizing infrastructure-as-code - Debugging complex DevOps pipelines - AI-assisted software development and documentation - Real-time technical troubleshooting ### 💡 **Getting Started** Use the standard `text-generation` pipeline with the `transformers` library. Supports role-based prompting (e.g., `user`, `assistant`) and performs best with high-reasoning prompts. ```python from transformers import pipeline pipe = pipeline("text-generation", model="ValiantLabs/gpt-oss-20b-Esper3.1", torch_dtype="auto", device_map="auto") messages = [{"role": "user", "content": "Design a Kubernetes cluster for a high-traffic web app with CI/CD via GitHub Actions."}] outputs = pipe(messages, max_new_tokens=2000) print(outputs[0]["generated_text"][-1]) ``` --- > 🔗 **Model Gallery Entry**: > *gpt-oss-20b-Esper3.1 – A powerful, open-source 20B model tuned for expert-level DevOps, coding, and system architecture. Built by Valiant Labs using high-quality technical datasets. Perfect for engineers, architects, and AI developers.*

Links

https://huggingface.co/mradermacher/gpt-oss-20b-Esper3.1-i1-GGUF

Tags

Model Gallery

Filter by type:

Filter by tags:

qwen3-vl-embedding-8b

qwen3-vl-embedding-2b

qwen3-vl-reranker-8b

qwen3-vl-reranker-2b-i1

huihui-ai_huihui-gpt-oss-20b-bf16-abliterated

dream-org_dream-v0-instruct-7b

huihui-ai_qwen3-14b-abliterated

huihui-jan-nano-abliterated

huihui-ai_gemma-3-1b-it-abliterated

huihui-ai_huihui-gemma-3n-e4b-it-abliterated

falcon3-1b-instruct-abliterated

falcon3-3b-instruct-abliterated

falcon3-10b-instruct-abliterated

falcon3-7b-instruct-abliterated

llama-3.2-3b-agent007-coder

smollm-1.7b-instruct

huihui-ai_deepseek-r1-distill-llama-70b-abliterated

openvino-multilingual-e5-base

openvino-all-MiniLM-L6-v2

aevum-0.6b-finetuned

gpt-oss-20b-esper3.1-i1