LocalAI - Models

qwen3-tts-cpp

Qwen3-TTS 0.6B (C++ / GGML) — native C++ text-to-speech from text input. Generates 24kHz mono audio. Supports 10 languages (en, zh, ja, ko, de, fr, es, it, pt, ru). Uses F16 GGUF models (~2 GB total).

Links

Tags

qwen3-tts-cpp-customvoice

Qwen3-TTS 0.6B Custom Voice (C++ / GGML) — text-to-speech with voice cloning support. Generates 24kHz mono audio with optional reference audio for voice cloning via ECAPA-TDNN speaker embeddings. Supports 10 languages (en, zh, ja, ko, de, fr, es, it, pt, ru).

Links

Tags

glm-4.7-flash-derestricted

This model is a quantized version of the original GLM-4.7-Flash-Derestricted model, derived from the base model `koute/GLM-4.7-Flash-Derestricted`. It is designed for restricted use, featuring tags like "derestricted," "uncensored," and "unlimited." The quantized versions (e.g., Q2_K, Q4_K_S, Q6_K) offer varying trade-offs between accuracy and efficiency, with the Q4_K_S and Q6_K variants being recommended for balanced performance. The model is optimized for fast inference and supports multiple quantization schemes, though some advanced quantization options (like IQ4_XS) are not available. It is intended for use in environments with specific constraints or restrictions.

Links

https://huggingface.co/mradermacher/GLM-4.7-Flash-Derestricted-GGUF

Tags

aurore-reveil_koto-small-7b-it

Koto-Small-7B-IT is an instruct-tuned version of Koto-Small-7B-PT, which was trained on MiMo-7B-Base for almost a billion tokens of creative-writing data. This model is meant for roleplaying and instruct usecases.

Links

Tags

kokoro

Kokoro is an open-weight TTS model with 82 million parametrs. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

Links

https://github.com/hexgrad/kokoro

Tags

kokoros

Kokoros is a pure Rust TTS backend using the Kokoro v1.0 ONNX model (82M parameters). Fast, streaming TTS with high quality. American English with af_heart voice.

Links

https://github.com/lucasjinreal/Kokoros

Tags

kokoros-ja

Kokoros Rust TTS - Japanese. Uses the Kokoro v1.0 ONNX model with Japanese phonemization.

Links

https://github.com/lucasjinreal/Kokoros

Tags

kokoros-cmn

Kokoros Rust TTS - Mandarin Chinese.

Links

https://github.com/lucasjinreal/Kokoros

Tags

kokoros-de

Kokoros Rust TTS - German.

Links

https://github.com/lucasjinreal/Kokoros

Tags

openbuddy_openbuddy-r1-0528-distill-qwen3-32b-preview0-qat

OpenBuddy distillation of Qwen3-32B from DeepSeek-R1, featuring 40K context window and multilingual support (zh, en, fr, de, ja, ko, it, fi). GGUF quantized version optimized for local inference with llama.cpp.

Links

Tags

l3.3-prikol-70b-v0.2

A merge of some Llama 3.3 models because um uh yeah Went extra schizo on the recipe, hoping for an extra fun result, and... Well, I guess it's an overall improvement over the previous revision. It's a tiny bit smarter, has even more distinct swipes and nice dialogues, but for some reason it's damn sloppy. I've published the second step of this merge as a separate model, and I'd say the results are more interesting, but not as usable as this one. https://huggingface.co/Nohobby/AbominationSnowPig Prompt format: Llama3 OR Llama3 Context and ChatML Instruct. It actually works a bit better this way

Links

Tags

nohobby_l3.3-prikol-70b-v0.4

I have yet to try it UPD: it sucks, bleh Sometimes mistakes {{user}} for {{char}} and can't think. Other than that, the behavior is similar to the predecessors. It sometimes gives some funny replies tho, yay!

Links

Tags

nohobby_l3.3-prikol-70b-v0.5

99% of mergekit addicts quit before they hit it big. Gosh, I need to create an org for my test runs - my profile looks like a dumpster. What was it again? Ah, the new model. Exactly what I wanted. All I had to do was yank out the cursed official DeepSeek distill and here we are. From the brief tests it gave me some unusual takes on the character cards I'm used to. Just this makes it worth it imo. Also the writing is kinda nice.

Links

Tags

nohobby_l3.3-prikol-70b-extra

After banging my head against the wall some more - I actually managed to merge DeepSeek distill into my mess! Along with even more models (my hand just slipped, I swear) The prose is better than in v0.5, but has a different feel to it, so I guess it's more of a step to the side than forward (hence the title EXTRA instead of 0.6). The context recall may have improved, or I'm just gaslighting myself to think so. And of course, since it now has DeepSeek in it - tags! They kinda work out of the box if you add to the 'Start Reply With' field in ST - that way the model will write a really short character thought in it. However, if we want some OOC reasoning, things get trickier. My initial thought was that this model could be instructed to use either only for {{char}}'s inner monologue or for detached analysis, but actually it would end up writing character thoughts most of the time anyway, and the times when it did reason stuff it threw the narrative out of the window by making it too formal and even adding some notes at the end.

Links

Tags

loki-v2.6-8b-1024k

The following models were included in the merge: MrRobotoAI/Epic_Fiction-8b MrRobotoAI/Unaligned-RP-Base-8b-1024k MrRobotoAI/Loki-.Epic_Fiction.-8b Casual-Autopsy/L3-Luna-8B Casual-Autopsy/L3-Super-Nova-RP-8B Casual-Autopsy/L3-Umbral-Mind-RP-v3.0-8B Casual-Autopsy/Halu-L3-Stheno-BlackOasis-8B Undi95/Llama-3-LewdPlay-8B Undi95/Llama-3-LewdPlay-8B-evo Undi95/Llama-3-Unholy-8B ChaoticNeutrals/Hathor_Tahsin-L3-8B-v0.9 ChaoticNeutrals/Hathor_RP-v.01-L3-8B ChaoticNeutrals/Domain-Fusion-L3-8B ChaoticNeutrals/T-900-8B ChaoticNeutrals/Poppy_Porpoise-1.4-L3-8B ChaoticNeutrals/Templar_v1_8B ChaoticNeutrals/Hathor_Respawn-L3-8B-v0.8 ChaoticNeutrals/Sekhmet_Gimmel-L3.1-8B-v0.3 zeroblu3/LewdPoppy-8B-RP tohur/natsumura-storytelling-rp-1.0-llama-3.1-8b jeiku/Chaos_RP_l3_8B tannedbum/L3-Nymeria-Maid-8B Nekochu/Luminia-8B-RP vicgalle/Humanish-Roleplay-Llama-3.1-8B saishf/SOVLish-Maid-L3-8B Dogge/llama-3-8B-instruct-Bluemoon-Freedom-RP MrRobotoAI/Epic_Fiction-8b-v4 maldv/badger-lambda-0-llama-3-8b maldv/llama-3-fantasy-writer-8b maldv/badger-kappa-llama-3-8b maldv/badger-mu-llama-3-8b maldv/badger-lambda-llama-3-8b maldv/badger-iota-llama-3-8b maldv/badger-writer-llama-3-8b Magpie-Align/MagpieLM-8B-Chat-v0.1 nbeerbower/llama-3-gutenberg-8B nothingiisreal/L3-8B-Stheno-Horny-v3.3-32K nbeerbower/llama-3-spicy-abliterated-stella-8B Magpie-Align/MagpieLM-8B-SFT-v0.1 NeverSleep/Llama-3-Lumimaid-8B-v0.1 mlabonne/NeuralDaredevil-8B-abliterated mlabonne/Daredevil-8B-abliterated NeverSleep/Llama-3-Lumimaid-8B-v0.1-OAS nothingiisreal/L3-8B-Instruct-Abliterated-DWP openchat/openchat-3.6-8b-20240522 turboderp/llama3-turbcat-instruct-8b UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3 Undi95/Llama-3-LewdPlay-8B TIGER-Lab/MAmmoTH2-8B-Plus OwenArli/Awanllm-Llama-3-8B-Cumulus-v1.0 refuelai/Llama-3-Refueled SicariusSicariiStuff/LLAMA-3_8B_Unaligned_Alpha NousResearch/Hermes-2-Theta-Llama-3-8B ResplendentAI/Nymph_8B grimjim/Llama-3-Oasis-v1-OAS-8B flammenai/Mahou-1.3b-llama3-8B lemon07r/Llama-3-RedMagic4-8B grimjim/Llama-3.1-SuperNova-Lite-lorabilterated-8B grimjim/Llama-Nephilim-Metamorphosis-v2-8B lemon07r/Lllama-3-RedElixir-8B grimjim/Llama-3-Perky-Pat-Instruct-8B ChaoticNeutrals/Hathor_RP-v.01-L3-8B grimjim/llama-3-Nephilim-v2.1-8B ChaoticNeutrals/Hathor_Respawn-L3-8B-v0.8 migtissera/Llama-3-8B-Synthia-v3.5 Locutusque/Llama-3-Hercules-5.0-8B WhiteRabbitNeo/Llama-3-WhiteRabbitNeo-8B-v2.0 VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct iRyanBell/ARC1-II HPAI-BSC/Llama3-Aloe-8B-Alpha HaitameLaf/Llama-3-8B-StoryGenerator failspy/Meta-Llama-3-8B-Instruct-abliterated-v3 Undi95/Llama-3-Unholy-8B ajibawa-2023/Uncensored-Frank-Llama-3-8B ajibawa-2023/SlimOrca-Llama-3-8B ChaoticNeutrals/Templar_v1_8B aifeifei798/llama3-8B-DarkIdol-2.2-Uncensored-1048K ChaoticNeutrals/Hathor_Tahsin-L3-8B-v0.9 Blackroot/Llama-3-Gamma-Twist FPHam/L3-8B-Everything-COT Blackroot/Llama-3-LongStory ChaoticNeutrals/Sekhmet_Gimmel-L3.1-8B-v0.3 abacusai/Llama-3-Smaug-8B Khetterman/CursedMatrix-8B-v9 ajibawa-2023/Scarlett-Llama-3-8B-v1.0 MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/physics_non_masked MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/electrical_engineering MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/college_chemistry MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/philosophy_non_masked MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/college_physics MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/philosophy MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/formal_logic MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/philosophy_100 MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/conceptual_physics MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/college_computer_science MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/psychology_non_masked MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/psychology MrRobotoAI/Unaligned-RP-Base-8b-1024k + Blackroot/Llama3-RP-Lora MrRobotoAI/Unaligned-RP-Base-8b-1024k + Azazelle/Llama-3-LimaRP-Instruct-LoRA-8B MrRobotoAI/Unaligned-RP-Base-8b-1024k + nothingiisreal/llama3-8B-DWP-lora MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/world_religions MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/high_school_european_history MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/electrical_engineering MrRobotoAI/Unaligned-RP-Base-8b-1024k + Azazelle/Llama-3-8B-Abomination-LORA MrRobotoAI/Unaligned-RP-Base-8b-1024k + Azazelle/Llama-3-LongStory-LORA MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/human_sexuality MrRobotoAI/Unaligned-RP-Base-8b-1024k + surya-narayanan/sociology MrRobotoAI/Unaligned-RP-Base-8b-1024k + ResplendentAI/Theory_of_Mind_Llama3 MrRobotoAI/Unaligned-RP-Base-8b-1024k + Azazelle/Smarts_Llama3 MrRobotoAI/Unaligned-RP-Base-8b-1024k + Azazelle/Llama-3-LongStory-LORA MrRobotoAI/Unaligned-RP-Base-8b-1024k + Azazelle/Nimue-8B MrRobotoAI/Unaligned-RP-Base-8b-1024k + vincentyandex/lora_llama3_chunked_novel_bs128 MrRobotoAI/Unaligned-RP-Base-8b-1024k + ResplendentAI/Aura_Llama3 MrRobotoAI/Unaligned-RP-Base-8b-1024k + Azazelle/L3-Daybreak-8b-lora MrRobotoAI/Unaligned-RP-Base-8b-1024k + ResplendentAI/Luna_Llama3 MrRobotoAI/Unaligned-RP-Base-8b-1024k + nicce/story-mixtral-8x7b-lora MrRobotoAI/Unaligned-RP-Base-8b-1024k + Blackroot/Llama-3-LongStory-LORA MrRobotoAI/Unaligned-RP-Base-8b-1024k + ResplendentAI/NoWarning_Llama3 MrRobotoAI/Unaligned-RP-Base-8b-1024k + ResplendentAI/BlueMoon_Llama3

Links

https://huggingface.co/QuantFactory/Loki-v2.6-8b-1024k-GGUF

Tags

llmevollama-3.1-8b-v0.1-i1

This project aims to optimize model merging by integrating LLMs into evolutionary strategies in a novel way. Instead of using the CMA-ES approach, the goal is to improve model optimization by leveraging the search capabilities of LLMs to explore the parameter space more efficiently and adjust the search scope based on high-performing solutions. Currently, the project supports optimization only within the Parameter Space, but I plan to extend its functionality to enable merging and optimization in the Data Flow Space as well. This will further enhance model merging by optimizing the interaction between data flow and parameters.

Links

Tags

hamanasu-adventure-4b-i1

Thanks to PocketDoc's Adventure datasets and taking his Dangerous Winds models as inspiration, I was able to finetune a small Adventure model that HATES the User The model is suited for Text Adventure, All thanks to Tav for funding the train. Support me and my finetunes on Ko-Fi https://ko-fi.com/deltavector

Links

Tags

hamanasu-magnum-4b-i1

This is a model designed to replicate the prose quality of the Claude 3 series of models. specifically Sonnet and Opus - Made with a prototype magnum V5 datamix. The model is suited for traditional RP, All thanks to Tav for funding the train. Support me and my finetunes on Ko-Fi https://ko-fi.com/deltavector

Links

Tags

facebook_kernelllm

We introduce KernelLLM, a large language model based on Llama 3.1 Instruct, which has been trained specifically for the task of authoring GPU kernels using Triton. KernelLLM translates PyTorch modules into Triton kernels and was evaluated on KernelBench-Triton (see here). KernelLLM aims to democratize GPU programming by making kernel development more accessible and efficient. KernelLLM's vision is to meet the growing demand for high-performance GPU kernels by automating the generation of efficient Triton implementations. As workloads grow larger and more diverse accelerator architectures emerge, the need for tailored kernel solutions has increased significantly. Although a number of works exist, most of them are limited to test-time optimization, while others tune on solutions traced of KernelBench problems itself, thereby limiting the informativeness of the results towards out-of-distribution generalization. To the best of our knowledge KernelLLM is the first LLM finetuned on external (torch, triton) pairs, and we hope that making our model available can accelerate progress towards intelligent kernel authoring systems. KernelLLM Workflow for Triton Kernel Generation: Our approach uses KernelLLM to translate PyTorch code (green) into Triton kernel candidates. Input and output components are marked in bold. The generations are validated against unit tests, which run kernels with random inputs of known shapes. This workflow allows us to evaluate multiple generations (pass@k) by increasing the number of kernel candidate generations. The best kernel implementation is selected and returned (green output). The model was trained on approximately 25,000 paired examples of PyTorch modules and their equivalent Triton kernel implementations, and additional synthetically generated samples. Our approach combines filtered code from TheStack [Kocetkov et al. 2022] and synthetic examples generated through torch.compile() and additional prompting techniques. The filtered and compiled dataset is [KernelBook]](https://huggingface.co/datasets/GPUMODE/KernelBook). We finetuned Llama3.1-8B-Instruct on the created dataset using supervised instruction tuning and measured its ability to generate correct Triton kernels and corresponding calling code on KernelBench-Triton, our newly created variant of KernelBench [Ouyang et al. 2025] targeting Triton kernel generation. The torch code was used with a prompt template containing a format example as instruction during both training and evaluation. The model was trained for 10 epochs with a batch size of 32 and a standard SFT recipe with hyperparameters selected by perplexity on a held-out subset of the training data. Training took circa 12 hours wall clock time on 16 GPUs (192 GPU hours), and we report the best checkpoint's validation results.

Links

Tags

yi-coder-9b-chat

Yi-Coder is a series of open-source code language models that delivers state-of-the-art coding performance with fewer than 10 billion parameters. Key features: Excelling in long-context understanding with a maximum context length of 128K tokens. Supporting 52 major programming languages: 'java', 'markdown', 'python', 'php', 'javascript', 'c++', 'c#', 'c', 'typescript', 'html', 'go', 'java_server_pages', 'dart', 'objective-c', 'kotlin', 'tex', 'swift', 'ruby', 'sql', 'rust', 'css', 'yaml', 'matlab', 'lua', 'json', 'shell', 'visual_basic', 'scala', 'rmarkdown', 'pascal', 'fortran', 'haskell', 'assembly', 'perl', 'julia', 'cmake', 'groovy', 'ocaml', 'powershell', 'elixir', 'clojure', 'makefile', 'coffeescript', 'erlang', 'lisp', 'toml', 'batchfile', 'cobol', 'dockerfile', 'r', 'prolog', 'verilog' For model details and benchmarks, see Yi-Coder blog and Yi-Coder README.

Links

Tags

yi-coder-1.5b-chat

Yi-Coder is a series of open-source code language models that delivers state-of-the-art coding performance with fewer than 10 billion parameters. Key features: Excelling in long-context understanding with a maximum context length of 128K tokens. Supporting 52 major programming languages: 'java', 'markdown', 'python', 'php', 'javascript', 'c++', 'c#', 'c', 'typescript', 'html', 'go', 'java_server_pages', 'dart', 'objective-c', 'kotlin', 'tex', 'swift', 'ruby', 'sql', 'rust', 'css', 'yaml', 'matlab', 'lua', 'json', 'shell', 'visual_basic', 'scala', 'rmarkdown', 'pascal', 'fortran', 'haskell', 'assembly', 'perl', 'julia', 'cmake', 'groovy', 'ocaml', 'powershell', 'elixir', 'clojure', 'makefile', 'coffeescript', 'erlang', 'lisp', 'toml', 'batchfile', 'cobol', 'dockerfile', 'r', 'prolog', 'verilog' For model details and benchmarks, see Yi-Coder blog and Yi-Coder README.

Links

Tags

Model Gallery

Filter by type:

Filter by tags:

qwen3-tts-cpp

qwen3-tts-cpp-customvoice

glm-4.7-flash-derestricted

aurore-reveil_koto-small-7b-it

kokoro

kokoros

kokoros-ja

kokoros-cmn

kokoros-de

openbuddy_openbuddy-r1-0528-distill-qwen3-32b-preview0-qat

l3.3-prikol-70b-v0.2

nohobby_l3.3-prikol-70b-v0.4

nohobby_l3.3-prikol-70b-v0.5

nohobby_l3.3-prikol-70b-extra

loki-v2.6-8b-1024k

llmevollama-3.1-8b-v0.1-i1

hamanasu-adventure-4b-i1

hamanasu-magnum-4b-i1

facebook_kernelllm

yi-coder-9b-chat

yi-coder-1.5b-chat