Model Gallery

3 models from 1 repositories

Filter by type:

Filter by tags:

kalomaze_qwen3-16b-a3b

A man-made horror beyond your comprehension. But no, seriously, this is my experiment to: measure the probability that any given expert will activate (over my personal set of fairly diverse calibration data), per layer prune 64/128 of the least used experts per layer (with reordered router and indexing per layer) It can still write semi-coherently without any additional training or distillation done on top of it from the original 30b MoE. The .txt files with the original measurements are provided in the repo along with the exported weights. Custom testing to measure the experts was done on a hacked version of vllm, and then I made a bespoke script to selectively export the weights according to the measurements.

Repository: localaiLicense: apache-2.0

l3.1-moe-2x8b-v0.2

This model is a Mixture of Experts (MoE) made with mergekit-moe. It uses the following base models: Joseph717171/Llama-3.1-SuperNova-8B-Lite_TIES_with_Base ArliAI/Llama-3.1-8B-ArliAI-RPMax-v1.2 Heavily inspired by mlabonne/Beyonder-4x7B-v3.

Repository: localaiLicense: llama3.1

deepseek-coder-v2-lite-instruct

DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from DeepSeek-Coder-V2-Base with 6 trillion tokens sourced from a high-quality and multi-source corpus. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-Coder-V2-Base, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found in the paper.

Repository: localaiLicense: deepseek-license