Model Gallery

3 models from 1 repositories

Filter by type:

Filter by tags:

qwen3.6-27b-heretic-uncensored-finetune-neo-code-di-imatrix-max
Qwen3.6-27B-Heretic2-Uncensored-Finetune-Thinking Yes... fully uncensored AND fine tuned lightly. Freedom and brainpower. Trained on different Heretic base, with different KLD/Refusals. Model fine tune was used to finalize and "firm up" Heretic / uncensored changes. The goal here was light, minor fixes rather than full / heavy fine tune. That being said, the tuning still raised critical metrics. This is Version 2, using "trohrbaugh" Heretic, which has a lower refusal rate, and tuning bumped up the metrics a bit more too. This has also positively impacted "NEO-Coder Di-Matrix" (dual imatrix) GGUF quants as well (vs heretic/non heretic too). https://huggingface.co/DavidAU/Qwen3.6-27B-Heretic-Uncensored-FINETUNE-NEO-CODE-Di-IMatrix-MAX-GGUF ``` IN HOUSE BENCHMARKS [by Nightmedia]: arc-c arc/e boolq hswag obkqa piqa wino Qwen3.6-27B-Heretic2-Uncensored-Finetune-Thinking mxfp8 0.673,0.846,0.905... [instruct mode] Qwen3.6-27B-Heretic-Uncensored-Finetune-Thinking mxfp8 0.669,0.835,0.906,... [instruct mode] BASE UNTUNED MODEL: Qwen3.6-27B HERETIC (by llmfan46) [instruct mode] mxfp8 0.644,0.788,0.902,... ...

Repository: localaiLicense: apache-2.0

invisietch_l3.3-ignition-v0.1-70b
Ignition v0.1 is a Llama 3.3-based model merge designed for creative roleplay and fiction writing purposes. The model underwent a multi-stage merge process designed to optimise for creative writing capability, minimising slop, and improving coherence when compared with its constituent models. The model shows a preference for detailed character cards and is sensitive to detailed system prompting. If you want a specific behavior from the model, try prompting for it directly. Inferencing has been tested at fp8 and fp16, and both are coherent up to ~64k context.

Repository: localaiLicense: llama3.3

mimo-7b-mtp:sglang
Xiaomi MiMo-7B-RL served by SGLang with built-in Multi-Token Prediction (MTP) heads (no separate drafter needed) plus online fp8 weight quantization to fit on a 16 GB consumer GPU. ~90% acceptance per the model card. Verified end-to-end at ~88 tok/s on an RTX 5070 Ti (16 GB). Note: mem_fraction_static is dropped to 0.7 (vs sglang's 0.85 default) because the MTP draft worker's vocab embedding is loaded unquantised (~1.2 GiB) and OOMs the static reservation otherwise.

Repository: localaiLicense: mit