LocalAI - Models

qwen3-4b-ra-sft

a 4B-sized agentic reasoning model that is finetuned with our 3k Agentic SFT dataset, based on Qwen3-4B-Instruct-2507. In our work, we systematically investigate three dimensions of agentic RL: data, algorithms, and reasoning modes. Our findings reveal 🎯 Data Quality Matters: Real end-to-end trajectories and high-diversity datasets significantly outperform synthetic alternatives ⚡ Training Efficiency: Exploration-friendly techniques like reward clipping and entropy maintenance boost training efficiency 🧠 Reasoning Strategy: Deliberative reasoning with selective tool calls surpasses frequent invocation or verbose self-reasoning We contribute high-quality SFT and RL datasets, demonstrating that simple recipes enable even 4B models to outperform 32B models on the most challenging reasoning benchmarks.

Links

Tags

qgallouedec_gemma-3-27b-it-codeforces-sft

This model is a fine-tuned version of google/gemma-3-27b-it on the open-r1/codeforces-cots dataset. It has been trained using TRL.

Links

Tags

burtenshaw_gemmacoder3-12b

This model is a fine-tuned version of google/gemma-3-12b-it on the open-r1/codeforces-cots dataset. It has been trained using TRL.

Links

Tags

llama-3.1-techne-rp-8b-v1

athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1-plus_reddit was further trained in the order below: SFT Doctor-Shotgun/no-robots-sharegpt grimulkan/LimaRP-augmented Inv/c2-logs-cleaned-deslopped DPO jondurbin/truthy-dpo-v0.1 Undi95/Weyaxi-humanish-dpo-project-noemoji athirdpath/DPO_Pairs-Roleplay-Llama3-NSFW

Links

Tags

llama-3.1-tulu-3-8b-sft

Tülu3 is a leading instruction following model family, offering fully open-source data, code, and recipes designed to serve as a comprehensive guide for modern post-training techniques. Tülu3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.

Links

Tags

llama-3.1-8b-open-sft

The Llama-3.1-8B-Open-SFT model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct, designed for advanced text generation tasks, including conversational interactions, question answering, and chain-of-thought reasoning. This model leverages Supervised Fine-Tuning (SFT) using the O1-OPEN/OpenO1-SFT dataset to provide enhanced performance in context-sensitive and instruction-following tasks.

Links

Tags

archangel_sft_pythia2-8b

datasets: - stanfordnlp/SHP - Anthropic/hh-rlhf - OpenAssistant/oasst1 This repo contains the model checkpoints for: - model family pythia2-8b - optimized with the loss SFT - aligned using the SHP, Anthropic HH and Open Assistant datasets. Please refer to our [code repository](https://github.com/ContextualAI/HALOs) or [blog](https://contextual.ai/better-cheaper-faster-llm-alignment-with-kto/) which contains intructions for training your own HALOs and links to our model cards.

Links

Tags

simia-tau-sft-qwen3-8b

The **Simia-Tau-SFT-Qwen3-8B** is a fine-tuned version of the Qwen3-8B language model, developed by Simia-Agent and adapted for enhanced instruction-following capabilities. This model is optimized for dialogue and task-oriented interactions, making it highly effective for real-world applications requiring nuanced understanding and coherent responses. The model is available in multiple quantized formats (GGUF), including Q4_K_S, Q5_K_M, Q8_0, and others, enabling efficient deployment across devices with varying computational resources. These quantized versions maintain strong performance while reducing memory footprint and inference latency. While this repository hosts a quantized variant (specifically designed for GGUF-based inference via tools like llama.cpp), the original base model is **Qwen3-8B**, a large-scale open-source language model from Alibaba Cloud. The fine-tuning (SFT) process improves its alignment with human intent and enhances its ability to follow complex instructions. > 🔍 **Note**: This is a quantized version; for the full-precision base model, refer to [Simia-Agent/Simia-Tau-SFT-Qwen3-8B](https://huggingface.co/Simia-Agent/Simia-Tau-SFT-Qwen3-8B) on Hugging Face. **Use Case**: Ideal for chatbots, assistant systems, and interactive applications requiring strong reasoning, safety, and fluency. **Model Size**: 8B parameters (quantized for efficiency). **License**: See the original model's license (typically Apache 2.0 for Qwen series). 👉 Recommended for edge deployment with GGUF-compatible tools.

Links

https://huggingface.co/mradermacher/Simia-Tau-SFT-Qwen3-8B-GGUF

Tags

Model Gallery

Filter by type:

Filter by tags:

qwen3-4b-ra-sft

qgallouedec_gemma-3-27b-it-codeforces-sft

burtenshaw_gemmacoder3-12b

llama-3.1-techne-rp-8b-v1

llama-3.1-tulu-3-8b-sft

llama-3.1-8b-open-sft

archangel_sft_pythia2-8b

simia-tau-sft-qwen3-8b