Model Gallery

8 models from 1 repositories

Filter by type:

Filter by tags:

omnilingual-0.3b-ctc-q8-sherpa
Omnilingual ASR CTC 300M (int8) is a multilingual automatic speech recognition model supporting 1,600+ languages. Based on Meta's omniASR_CTC_300M architecture (Wav2Vec2 with CTC head), quantized to int8 for efficient inference. Uses the sherpa-onnx backend with ONNX Runtime.

Repository: localaiLicense: apache-2.0

streaming-zipformer-en-sherpa
Streaming English ASR: sherpa-onnx zipformer transducer (int8, chunk-16 left-128). Low-latency real-time transcription with endpoint detection via sherpa-onnx's online recognizer. English-only; for multilingual offline ASR see omnilingual-0.3b-ctc-q8-sherpa.

Repository: localaiLicense: apache-2.0

insightface-opencv-int8
Int8-quantized OpenCV Zoo face pair (YuNet int8 + SFace int8, ~12MB). Roughly 3x smaller and noticeably faster on CPU than the fp32 variant at comparable accuracy for face tasks. APACHE 2.0 — commercial-safe. Weights are downloaded on install via LocalAI's gallery mechanism.

Repository: localaiLicense: apache-2.0

intellect-1-instruct
INTELLECT-1 is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code. This is an instruct model. The base model associated with it is INTELLECT-1. INTELLECT-1 was trained on up to 14 concurrent nodes distributed across 3 continents, with contributions from 30 independent community contributors providing compute. The training code utilizes the prime framework, a scalable distributed training framework designed for fault-tolerant, dynamically scaling, high-perfomance training on unreliable, globally distributed workers. The key abstraction that allows dynamic scaling is the ElasticDeviceMesh which manages dynamic global process groups for fault-tolerant communication across the internet and local process groups for communication within a node. The model was trained using the DiLoCo algorithms with 100 inner steps. The global all-reduce was done with custom int8 all-reduce kernels to reduce the communication payload required, greatly reducing the communication overhead by a factor 400x.

Repository: localaiLicense: apache-2.0

openvino-llama-3-8b-instruct-ov-int8
OpenVINO IR model with int8 quantization of Meta's Llama 3 8B Instruct. Optimized for dialogue use cases and instruction following. Supports an 8k context window.

Repository: localaiLicense: llama3

openvino-llama3-aloe
Aloe is a healthcare-focused large language model based on Meta Llama 3 8B, optimized for OpenVINO inference with int8 quantization. It is instruction-tuned for medical and ethical reasoning tasks, offering competitive performance on healthcare QA datasets.

Repository: localaiLicense: cc-by-nc-4.0

openvino-starling-lm-7b-beta-openvino-int8
Starling-LM-7B-beta is a Mistral-7B based chat model finetuned with RLHF and RLAIF for improved instruction following. This OpenVINO IR version features int8 quantization for optimized local inference. It utilizes the OpenChat chat template for consistent conversational output.

Repository: localaiLicense: apache-2.0

openvino-hermes2pro-llama3
OpenVINO optimized 8B instruction-tuned Llama-3 model based on the Hermes-2-Pro fine-tune. Features support for function calling and JSON mode, designed for efficient inference.

Repository: localaiLicense: apache-2.0