OpenVINO IR model with int8 quantization of Meta's Llama 3 8B Instruct. Optimized for dialogue use cases and instruction following. Supports an 8k context window.
Links
Tags
Repository: localaiLicense: mit
An OpenVINO-optimized version of the Phi-3 Mini instruction-tuned model with 3.8 billion parameters. It supports a 128k context window and is designed for reasoning, coding, and chat tasks in compute-constrained environments.
Links
Tags
Repository: localaiLicense: cc-by-nc-4.0

Aloe is a healthcare-focused large language model based on Meta Llama 3 8B, optimized for OpenVINO inference with int8 quantization. It is instruction-tuned for medical and ethical reasoning tasks, offering competitive performance on healthcare QA datasets.
Links
Tags
Repository: localaiLicense: apache-2.0
Starling-LM-7B-beta is a Mistral-7B based chat model finetuned with RLHF and RLAIF for improved instruction following. This OpenVINO IR version features int8 quantization for optimized local inference. It utilizes the OpenChat chat template for consistent conversational output.
Links
Tags
WizardLM-2 7B instruction-tuned language model optimized for OpenVINO backend. Supports conversational chat and text completion with 8192 context window.
Links
Tags
OpenVINO optimized 8B instruction-tuned Llama-3 model based on the Hermes-2-Pro fine-tune. Features support for function calling and JSON mode, designed for efficient inference.
Links
Tags
Multilingual E5 base embedding model optimized for semantic similarity and retrieval tasks. Supports OpenVINO and ONNX inference formats. Ideal for cross-lingual vector search and semantic matching.
Links
Tags
Repository: localaiLicense: apache-2.0
This sentence-transformers model maps text to 384-dimensional dense vectors for semantic similarity tasks. Based on the MiniLM architecture, it is optimized for OpenVINO inference. Ideal for retrieval-augmented generation (RAG) pipelines.
Links
Tags