Repository: localaiLicense: gemma

Google Gemma 4 E2B-IT served by SGLang with Multi-Token Prediction (MTP) speculative decoding. The companion drafter google/gemma-4-E2B-it-assistant lets the target accept several tokens per step. Flags are a 1:1 transcription of the SGLang cookbook's MTP command (NEXTN algorithm, num_steps=5, num_draft_tokens=6, eagle_topk=1, mem_fraction_static=0.85). The E2B variant has 5B total / 2B effective parameters and targets the smaller end of consumer GPUs.
Links
Tags
Repository: localaiLicense: gemma

Google Gemma 4 E4B-IT served by SGLang with Multi-Token Prediction (MTP) speculative decoding. The companion drafter google/gemma-4-E4B-it-assistant lets the target accept several tokens per step. Flags are a 1:1 transcription of the SGLang cookbook's MTP command (NEXTN algorithm, num_steps=5, num_draft_tokens=6, eagle_topk=1, mem_fraction_static=0.85). The E4B variant has 8B total / 4B effective parameters — the natural pick for consumer GPUs in the 16–24 GB range.
Links
Tags
Repository: localaiLicense: mit

Xiaomi MiMo-7B-RL served by SGLang with built-in Multi-Token Prediction (MTP) heads (no separate drafter needed) plus online fp8 weight quantization to fit on a 16 GB consumer GPU. ~90% acceptance per the model card. Verified end-to-end at ~88 tok/s on an RTX 5070 Ti (16 GB). Note: mem_fraction_static is dropped to 0.7 (vs sglang's 0.85 default) because the MTP draft worker's vocab embedding is loaded unquantised (~1.2 GiB) and OOMs the static reservation otherwise.
Links
Tags
Repository: localaiLicense: apache-2.0
**Model Name:** Orca-Agent-v0.1 **Base Model:** Qwen3-14B **Repository:** [Danau5tin/Orca-Agent-v0.1](https://huggingface.co/Danau5tin/Orca-Agent-v0.1) **License:** Apache 2.0 **Use Case:** Multi-Agent Orchestration for Complex Code & System Tasks --- ### 🔍 **Overview** Orca-Agent-v0.1 is a powerful **task orchestration agent** designed to manage complex, multi-step workflows—especially in code and system administration—without directly modifying code. Instead, it acts as a strategic planner that coordinates a team of specialized agents. --- ### 🛠️ **Key Features** - **Intelligent Task Breakdown:** Analyzes user requests and decomposes them into focused subtasks. - **Agent Coordination:** Dynamically dispatches: - *Explorer agents* to understand the system state. - *Coder agents* to implement changes with precise instructions. - *Verifier agents* to validate results. - **Context Management:** Maintains a persistent context store to track discoveries across steps. - **High Performance:** Achieves **18.25% on TerminalBench** when paired with Qwen3-Coder-30B, nearing the performance of a 480B model. --- ### 📊 **Performance** | Orchestrator | Subagent | Terminal Bench | |--------------|----------|----------------| | Orca-Agent-v0.1-14B | Qwen3-Coder-30B | **18.25%** | | Qwen3-14B | Qwen3-Coder-30B | 7.0% | > *Trained on 32x H100s using GRPO + curriculum learning, with full open-source training code available.* --- ### 📌 **Example Output** ```xml agent_type: 'coder' title: 'Attempt recovery using the identified backup file' description: | Move the backup file from /tmp/terraform_work/.terraform.tfstate.tmp to /infrastructure/recovered_state.json. Verify file existence, size, and permissions (rw-r--r--). max_turns: 10 context_refs: ['task_003'] ``` --- ### 📁 **Serving** - ✅ **vLLM:** `vllm serve Danau5tin/Orca-Agent-v0.1` - ✅ **SGLang:** `python -m sglang.launch_server --model-path Danau5tin/Orca-Agent-v0.1` --- ### 🌐 **Learn More** - **Training & Code:** [GitHub - Orca-Agent-RL](https://github.com/Danau5tin/Orca-Agent-RL) - **Orchestration Framework:** [multi-agent-coding-system](https://github.com/Danau5tin/multi-agent-coding-system) --- > ✅ *Note: The model at `mradermacher/Orca-Agent-v0.1-i1-GGUF` is a quantized version of this original model. This description reflects the full, unquantized version by the original author.*
Links
Tags