LocalAI - Models

lfm2.5-audio-1.5b-realtime

LFM2.5-Audio-1.5B is LiquidAI's any-to-any audio foundation model. The 1.2B LFM2.5 backbone plus a FastConformer audio encoder and an LFM2-based audio detokenizer give real-time speech-to-speech with text + audio output interleaved at 12.5 Hz / 24 kHz. This entry runs in S2S (speech-to-speech) mode and is the model the LocalAI realtime API any-to-any path consumes. Switch to ASR, TTS, or chat by picking the sibling gallery entries.

Links

https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B

Tags

lfm2.5-audio-1.5b-chat

LFM2.5-Audio-1.5B in text-only chat mode. The model runs `generate_sequential` with no audio modality, behaving like a small LFM2 chat model. Pick this entry for tool-calling experiments without the audio overhead.

Links

https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B

Tags

lfm2.5-audio-1.5b-asr

LFM2.5-Audio-1.5B in ASR mode. System prompt `Perform ASR.` is prepended; output is capitalised and punctuated. Wire this entry as a transcription model on the /v1/audio/transcriptions endpoint.

Links

https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B

Tags

lfm2.5-audio-1.5b-tts

LFM2.5-Audio-1.5B in TTS mode. Four baked voices: us_male, us_female, uk_male, uk_female — pick the default at load time via `voice:` option, or override per-request via the OpenAI `/v1/audio/speech` `voice` field.

Links

https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B

Tags

qwen3-30b-a1.5b-high-speed

This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly. This is a simple "finetune" of the Qwen's "Qwen 30B-A3B" (MOE) model, setting the experts in use from 8 to 4 (out of 128 experts). This method close to doubles the speed of the model and uses 1.5B (of 30B) parameters instead of 3B (of 30B) parameters. Depending on the application you may want to use the regular model ("30B-A3B"), and use this model for simpler use case(s) although I did not notice any loss of function during routine (but not extensive) testing. Example generation (Q4KS, CPU) at the bottom of this page using 4 experts / this model. More complex use cases may benefit from using the normal version. For reference: Cpu only operation Q4KS (windows 11) jumps from 12 t/s to 23 t/s. GPU performance IQ3S jumps from 75 t/s to over 125 t/s. (low to mid level card) Context size: 32K + 8K for output (40k total)

Links

Tags

opencoder-1.5b-base

The model is a large language model with 1.5 billion parameters, trained on 2.5 trillion tokens of code-related data. It supports both English and Chinese languages and is part of the OpenCoder LLM family which also includes 8B base and chat models. The model achieves high performance across multiple language model benchmarks and is one of the most comprehensively open-sourced models available.

Links

Tags

opencoder-1.5b-instruct

The model is a quantized version of [infly/OpenCoder-1.5B-Instruct](https://huggingface.co/infly/OpenCoder-1.5B-Instruct) created using llama.cpp. The original model, infly/OpenCoder-1.5B-Instruct, is an open and reproducible code LLM family which includes 1.5B and 8B base and chat models, supporting both English and Chinese languages. The model is pretrained on 2.5 trillion tokens composed of 90% raw code and 10% code-related web data, and supervised finetuned on over 4.5M high-quality SFT examples. It achieves high performance across multiple language model benchmarks, positioning it among the leading open-source models for code.

Links

https://huggingface.co/QuantFactory/OpenCoder-1.5B-Instruct-GGUF

Tags

deepseek-r1-distill-qwen-1.5b

DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks. Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing. By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.

Links

Tags

agentica-org_deepscaler-1.5b-preview

DeepScaleR-1.5B-Preview is a language model fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B using distributed reinforcement learning (RL) to scale up to long context lengths. The model achieves 43.1% Pass@1 accuracy on AIME 2024, representing a 15% improvement over the base model (28.8%) and surpassing OpenAI's O1-Preview performance with just 1.5B parameters.

Links

Tags

knoveleng_open-rs3

This repository hosts model for the Open RS project, accompanying the paper Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t. The project explores enhancing reasoning capabilities in small large language models (LLMs) using reinforcement learning (RL) under resource-constrained conditions. We focus on a 1.5-billion-parameter model, DeepSeek-R1-Distill-Qwen-1.5B, trained on 4 NVIDIA A40 GPUs (48 GB VRAM each) within 24 hours. By adapting the Group Relative Policy Optimization (GRPO) algorithm and leveraging a curated, compact mathematical reasoning dataset, we conducted three experiments to assess performance and behavior. Key findings include: Significant reasoning improvements, e.g., AMC23 accuracy rising from 63% to 80% and AIME24 reaching 46.7%, outperforming o1-preview. Efficient training with just 7,000 samples at a cost of $42, compared to thousands of dollars for baseline models. Challenges like optimization instability and length constraints with extended training. These results showcase RL-based fine-tuning as a cost-effective approach for small LLMs, making reasoning capabilities accessible in resource-limited settings. We open-source our code, models, and datasets to support further research.

Links

Tags

agentica-org_deepcoder-1.5b-preview

DeepCoder-1.5B-Preview is a code reasoning LLM fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B using distributed reinforcement learning (RL) to scale up to long context lengths. Data Our training dataset consists of approximately 24K unique problem-tests pairs compiled from: Taco-Verified PrimeIntellect SYNTHETIC-1 LiveCodeBench v5 (5/1/23-7/31/24)

Links

Tags

zyphra_zr1-1.5b

ZR1-1.5B is a small reasoning model trained extensively on both verified coding and mathematics problems with reinforcement learning. The model outperforms Llama-3.1-70B-Instruct on hard coding tasks and improves upon the base R1-Distill-1.5B model by over 50%, while achieving strong scores on math evaluations and a 37.91% pass@1 accuracy on GPQA-Diamond with just 1.5B parameters.

Links

Tags

nvidia_nemotron-research-reasoning-qwen-1.5b

Nemotron-Research-Reasoning-Qwen-1.5B is the world’s leading 1.5B open-weight model for complex reasoning tasks such as mathematical problems, coding challenges, scientific questions, and logic puzzles. It is trained using the ProRL algorithm on a diverse and comprehensive set of datasets. Our model has achieved impressive results, outperforming Deepseek’s 1.5B model by a large margin on a broad range of tasks, including math, coding, and GPQA. This model is for research and development only.

Links

Tags

arch-router-1.5b-q4

Arch-Router-1.5B is a compact router LLM from Katanemo, fine-tuned from Qwen2.5-1.5B-Instruct. Given a prompt and a set of user-defined route policies (domain + action), it picks the best-matching policy name so requests can be dispatched to the appropriate downstream model. Designed for low-latency, high-throughput use inside the Arch proxy, it pairs with LocalAI's router classifier as a preference-aligned alternative to embedding/ColBERT-based routing on concrete, well-described policies.

Links

Tags

arch-router-1.5b-q8

Arch-Router-1.5B is a compact router LLM from Katanemo, fine-tuned from Qwen2.5-1.5B-Instruct. Given a prompt and a set of user-defined route policies (domain + action), it picks the best-matching policy name so requests can be dispatched to the appropriate downstream model. Designed for low-latency, high-throughput use inside the Arch proxy, it pairs with LocalAI's router classifier as a preference-aligned alternative to embedding/ColBERT-based routing on concrete, well-described policies.

Links

Tags

yi-coder-1.5b-chat

Yi-Coder is a series of open-source code language models that delivers state-of-the-art coding performance with fewer than 10 billion parameters. Key features: Excelling in long-context understanding with a maximum context length of 128K tokens. Supporting 52 major programming languages: 'java', 'markdown', 'python', 'php', 'javascript', 'c++', 'c#', 'c', 'typescript', 'html', 'go', 'java_server_pages', 'dart', 'objective-c', 'kotlin', 'tex', 'swift', 'ruby', 'sql', 'rust', 'css', 'yaml', 'matlab', 'lua', 'json', 'shell', 'visual_basic', 'scala', 'rmarkdown', 'pascal', 'fortran', 'haskell', 'assembly', 'perl', 'julia', 'cmake', 'groovy', 'ocaml', 'powershell', 'elixir', 'clojure', 'makefile', 'coffeescript', 'erlang', 'lisp', 'toml', 'batchfile', 'cobol', 'dockerfile', 'r', 'prolog', 'verilog' For model details and benchmarks, see Yi-Coder blog and Yi-Coder README.

Links

Tags

yi-coder-1.5b

Yi-Coder is a series of open-source code language models that delivers state-of-the-art coding performance with fewer than 10 billion parameters. Key features: Excelling in long-context understanding with a maximum context length of 128K tokens. Supporting 52 major programming languages: 'java', 'markdown', 'python', 'php', 'javascript', 'c++', 'c#', 'c', 'typescript', 'html', 'go', 'java_server_pages', 'dart', 'objective-c', 'kotlin', 'tex', 'swift', 'ruby', 'sql', 'rust', 'css', 'yaml', 'matlab', 'lua', 'json', 'shell', 'visual_basic', 'scala', 'rmarkdown', 'pascal', 'fortran', 'haskell', 'assembly', 'perl', 'julia', 'cmake', 'groovy', 'ocaml', 'powershell', 'elixir', 'clojure', 'makefile', 'coffeescript', 'erlang', 'lisp', 'toml', 'batchfile', 'cobol', 'dockerfile', 'r', 'prolog', 'verilog' For model details and benchmarks, see Yi-Coder blog and Yi-Coder README.

Links

Tags

Model Gallery

Filter by type:

Filter by tags:

lfm2.5-audio-1.5b-realtime

lfm2.5-audio-1.5b-chat

lfm2.5-audio-1.5b-asr

lfm2.5-audio-1.5b-tts

qwen3-30b-a1.5b-high-speed

opencoder-1.5b-base

opencoder-1.5b-instruct

deepseek-r1-distill-qwen-1.5b

agentica-org_deepscaler-1.5b-preview

knoveleng_open-rs3

agentica-org_deepcoder-1.5b-preview

zyphra_zr1-1.5b

nvidia_nemotron-research-reasoning-qwen-1.5b

arch-router-1.5b-q4

arch-router-1.5b-q8

yi-coder-1.5b-chat

yi-coder-1.5b