Model Gallery

1 models from 1 repositories

Filter by type:

Filter by tags:

nvidia_llama-3.1-8b-ultralong-4m-instruct

We introduce UltraLong-8B, a series of ultra-long context language models designed to process extensive sequences of text (up to 1M, 2M, and 4M tokens) while maintaining competitive performance on standard benchmarks. Built on the Llama-3.1, UltraLong-8B leverages a systematic training recipe that combines efficient continued pretraining with instruction tuning to enhance long-context understanding and instruction-following capabilities. This approach enables our models to efficiently scale their context windows without sacrificing general performance.

Repository: localaiLicense: cc-by-nc-4.0