LocalAI - Models

omnilingual-0.3b-ctc-q8-sherpa

Omnilingual ASR CTC 300M (int8) is a multilingual automatic speech recognition model supporting 1,600+ languages. Based on Meta's omniASR_CTC_300M architecture (Wav2Vec2 with CTC head), quantized to int8 for efficient inference. Uses the sherpa-onnx backend with ONNX Runtime.

Links

Tags

streaming-zipformer-en-sherpa

Streaming English ASR: sherpa-onnx zipformer transducer (int8, chunk-16 left-128). Low-latency real-time transcription with endpoint detection via sherpa-onnx's online recognizer. English-only; for multilingual offline ASR see omnilingual-0.3b-ctc-q8-sherpa.

Links

Tags

insightface-opencv-int8

Int8-quantized OpenCV Zoo face pair (YuNet int8 + SFace int8, ~12MB). Roughly 3x smaller and noticeably faster on CPU than the fp32 variant at comparable accuracy for face tasks. APACHE 2.0 — commercial-safe. Weights are downloaded on install via LocalAI's gallery mechanism.

Links

https://github.com/opencv/opencv_zoo

Tags

intellect-1-instruct

INTELLECT-1 is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code. This is an instruct model. The base model associated with it is INTELLECT-1. INTELLECT-1 was trained on up to 14 concurrent nodes distributed across 3 continents, with contributions from 30 independent community contributors providing compute. The training code utilizes the prime framework, a scalable distributed training framework designed for fault-tolerant, dynamically scaling, high-perfomance training on unreliable, globally distributed workers. The key abstraction that allows dynamic scaling is the ElasticDeviceMesh which manages dynamic global process groups for fault-tolerant communication across the internet and local process groups for communication within a node. The model was trained using the DiLoCo algorithms with 100 inner steps. The global all-reduce was done with custom int8 all-reduce kernels to reduce the communication payload required, greatly reducing the communication overhead by a factor 400x.

Links

Tags

openvino-llama-3-8b-instruct-ov-int8

OpenVINO IR model with int8 quantization of Meta's Llama 3 8B Instruct. Optimized for dialogue use cases and instruction following. Supports an 8k context window.

Links

https://huggingface.co/fakezeta/llama-3-8b-instruct-ov-int8

Tags

openvino-llama3-aloe

Aloe is a healthcare-focused large language model based on Meta Llama 3 8B, optimized for OpenVINO inference with int8 quantization. It is instruction-tuned for medical and ethical reasoning tasks, offering competitive performance on healthcare QA datasets.

Links

https://huggingface.co/fakezeta/Llama3-Aloe-8B-Alpha-ov-int8

Tags

openvino-starling-lm-7b-beta-openvino-int8

Starling-LM-7B-beta is a Mistral-7B based chat model finetuned with RLHF and RLAIF for improved instruction following. This OpenVINO IR version features int8 quantization for optimized local inference. It utilizes the OpenChat chat template for consistent conversational output.

Links

https://huggingface.co/fakezeta/Starling-LM-7B-beta-openvino-int8

Tags

openvino-hermes2pro-llama3

OpenVINO optimized 8B instruction-tuned Llama-3 model based on the Hermes-2-Pro fine-tune. Features support for function calling and JSON mode, designed for efficient inference.

Links

https://huggingface.co/fakezeta/Hermes-2-Pro-Llama-3-8B-ov-int8

Tags

Model Gallery

Filter by type:

Filter by tags:

omnilingual-0.3b-ctc-q8-sherpa

streaming-zipformer-en-sherpa

insightface-opencv-int8

intellect-1-instruct

openvino-llama-3-8b-instruct-ov-int8

openvino-llama3-aloe

openvino-starling-lm-7b-beta-openvino-int8

openvino-hermes2pro-llama3