LocalAI - Models

ibm-granite_granite-4.0-h-small

Granite-4.0-H-Small is a 32B parameter long-context instruct model finetuned from Granite-4.0-H-Small-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Links

Tags

ibm-granite_granite-4.0-h-tiny

Granite-4.0-H-Tiny is a 7B parameter long-context instruct model finetuned from Granite-4.0-H-Tiny-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Links

Tags

ibm-granite_granite-4.0-h-micro

Granite-4.0-H-Micro is a 3B parameter long-context instruct model finetuned from Granite-4.0-H-Micro-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Links

Tags

ibm-granite_granite-4.0-micro

Granite-4.0-Micro is a 3B parameter long-context instruct model finetuned from Granite-4.0-Micro-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Links

Tags

granite-3.0-1b-a400m-instruct

Granite 3.0 language models are a new set of lightweight state-of-the-art, open foundation models that natively support multilinguality, coding, reasoning, and tool usage, including the potential to be run on constrained compute resources. All the models are publicly released under an Apache 2.0 license for both research and commercial use. The models' data curation and training procedure were designed for enterprise usage and customization in mind, with a process that evaluates datasets for governance, risk and compliance (GRC) criteria, in addition to IBM's standard data clearance process and document quality checks. Granite 3.0 includes 4 different models of varying sizes: Dense Models: 2B and 8B parameter models, trained on 12 trillion tokens in total. Mixture-of-Expert (MoE) Models: Sparse 1B and 3B MoE models, with 400M and 800M activated parameters respectively, trained on 10 trillion tokens in total. Accordingly, these options provide a range of models with different compute requirements to choose from, with appropriate trade-offs with their performance on downstream tasks. At each scale, we release a base model — checkpoints of models after pretraining, as well as instruct checkpoints — models finetuned for dialogue, instruction-following, helpfulness, and safety.

Links

Tags

moe-girl-800ma-3bt

A roleplay-centric finetune of IBM's Granite 3.0 3B-A800M. LoRA finetune trained locally, whereas the others were FFT; while this results in less uptake of training data, it should also mean less degradation in Granite's core abilities, making it potentially easier to use for general-purpose tasks. Disclaimer PLEASE do not expect godliness out of this, it's a model with 800 million active parameters. Expect something more akin to GPT-3 (the original, not GPT-3.5.) (Furthermore, this version is by a less experienced tuner; it's my first finetune that actually has decent-looking graphs, I don't really know what I'm doing yet!)

Links

Tags

ibm-granite_granite-3.2-8b-instruct

Granite-3.2-8B-Instruct is an 8-billion-parameter, long-context AI model fine-tuned for thinking capabilities. Built on top of Granite-3.1-8B-Instruct, it has been trained using a mix of permissively licensed open-source datasets and internally generated synthetic data designed for reasoning tasks. The model allows controllability of its thinking capability, ensuring it is applied only when required.

Links

Tags

ibm-granite_granite-3.2-2b-instruct

Granite-3.2-2B-Instruct is an 2-billion-parameter, long-context AI model fine-tuned for thinking capabilities. Built on top of Granite-3.1-2B-Instruct, it has been trained using a mix of permissively licensed open-source datasets and internally generated synthetic data designed for reasoning tasks. The model allows controllability of its thinking capability, ensuring it is applied only when required.

Links

Tags

granite-embedding-107m-multilingual

Granite-Embedding-107M-Multilingual is a 107M parameter dense biencoder embedding model from the Granite Embeddings suite that can be used to generate high quality text embeddings. This model produces embedding vectors of size 384 and is trained using a combination of open source relevance-pair datasets with permissive, enterprise-friendly license, and IBM collected and generated datasets. This model is developed using contrastive finetuning, knowledge distillation and model merging for improved performance.

Links

Tags

granite-embedding-125m-english

Granite-Embedding-125m-English is a 125M parameter dense biencoder embedding model from the Granite Embeddings suite that can be used to generate high quality text embeddings. This model produces embedding vectors of size 768. Compared to most other open-source models, this model was only trained using open-source relevance-pair datasets with permissive, enterprise-friendly license, plus IBM collected and generated datasets. While maintaining competitive scores on academic benchmarks such as BEIR, this model also performs well on many enterprise use cases. This model is developed using retrieval oriented pretraining, contrastive finetuning and knowledge distillation.

Links

Tags

ibm-granite_granite-3.3-8b-instruct

Granite-3.3-2B-Instruct is a 2-billion parameter 128K context length language model fine-tuned for improved reasoning and instruction-following capabilities. Built on top of Granite-3.3-2B-Base, the model delivers significant gains on benchmarks for measuring generic performance including AlpacaEval-2.0 and Arena-Hard, and improvements in mathematics, coding, and instruction following. It supports structured reasoning through and tags, providing clear separation between internal thoughts and final outputs. The model has been trained on a carefully balanced combination of permissively licensed data and curated synthetic tasks.

Links

Tags

ibm-granite_granite-3.3-2b-instruct

Granite-3.3-2B-Instruct is a 2-billion parameter 128K context length language model fine-tuned for improved reasoning and instruction-following capabilities. Built on top of Granite-3.3-2B-Base, the model delivers significant gains on benchmarks for measuring generic performance including AlpacaEval-2.0 and Arena-Hard, and improvements in mathematics, coding, and instruction following. It supports structured reasoning through and tags, providing clear separation between internal thoughts and final outputs. The model has been trained on a carefully balanced combination of permissively licensed data and curated synthetic tasks.

Links

Tags

ibm-granite.granite-4.0-1b

### **Granite-4.0-1B** *By IBM | Apache 2.0 License* **Overview:** Granite-4.0-1B is a lightweight, instruction-tuned language model designed for efficient on-device and research use. Built on a decoder-only dense transformer architecture, it delivers strong performance in instruction following, code generation, tool calling, and multilingual tasks—making it ideal for applications requiring low latency and minimal resource usage. **Key Features:** - **Size:** 1.6 billion parameters (1B Dense), optimized for efficiency. - **Capabilities:** - Text generation, summarization, question answering - Code completion and function calling (e.g., API integration) - Multilingual support (English, Spanish, French, German, Japanese, Chinese, Arabic, Korean, Portuguese, Italian, Dutch, Czech) - Robust safety and alignment via instruction tuning and reinforcement learning - **Architecture:** Uses GQA (Grouped Query Attention), SwiGLU activation, RMSNorm, shared input/output embeddings, and RoPE position embeddings. - **Context Length:** Up to 128K tokens — suitable for long-form content and complex reasoning. - **Training:** Finetuned from *Granite-4.0-1B-Base* using open-source datasets, synthetic data, and human-curated instruction pairs. **Performance Highlights (1B Dense):** - **MMLU (5-shot):** 59.39 - **HumanEval (pass@1):** 74 - **IFEval (Alignment):** 80.82 - **GSM8K (8-shot):** 76.35 - **SALAD-Bench (Safety):** 93.44 **Use Cases:** - On-device AI applications - Research and prototyping - Fine-tuning for domain-specific tasks - Low-resource environments with high performance expectations **Resources:** - [Hugging Face Model](https://huggingface.co/ibm-granite/granite-4.0-1b) - [Granite Docs](https://www.ibm.com/granite/docs/) - [GitHub Repository](https://github.com/ibm-granite/granite-4.0-nano-language-models) > *“Make knowledge free for everyone.” – IBM Granite Team*

Links

https://huggingface.co/DevQuasar/ibm-granite.granite-4.0-1b-GGUF

Tags

granite-crispasr

IBM Granite Speech 4.0 1B ASR. Runs via the CrispASR backend. Default GGUF size ~2.94 GB.

Links

https://huggingface.co/cstr/granite-speech-4.0-1b-GGUF

Tags

granite-4.1-crispasr

IBM Granite Speech 4.1 2B ASR. Runs via the CrispASR backend. Default GGUF size ~2.94 GB.

Links

https://huggingface.co/cstr/granite-speech-4.1-2b-GGUF

Tags

granite-4.1-plus-crispasr

IBM Granite Speech 4.1 2B Plus ASR. Runs via the CrispASR backend. Default GGUF size ~2.96 GB.

Links

https://huggingface.co/cstr/granite-speech-4.1-2b-plus-GGUF

Tags

granite-4.1-nar-crispasr

IBM Granite Speech 4.1 2B NAR (non-autoregressive) ASR. Runs via the CrispASR backend. Default GGUF size ~3.2 GB.

Links

https://huggingface.co/cstr/granite-speech-4.1-2b-nar-GGUF

Tags

Model Gallery

Filter by type:

Filter by tags:

ibm-granite_granite-4.0-h-small

ibm-granite_granite-4.0-h-tiny

ibm-granite_granite-4.0-h-micro

ibm-granite_granite-4.0-micro

granite-3.0-1b-a400m-instruct

moe-girl-800ma-3bt

ibm-granite_granite-3.2-8b-instruct

ibm-granite_granite-3.2-2b-instruct

granite-embedding-107m-multilingual

granite-embedding-125m-english

ibm-granite_granite-3.3-8b-instruct

ibm-granite_granite-3.3-2b-instruct

ibm-granite.granite-4.0-1b

granite-crispasr

granite-4.1-crispasr

granite-4.1-plus-crispasr

granite-4.1-nar-crispasr