GitHub – OptimalScale/LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

LMFlow is an extensible toolbox designed to streamline the finetuning of large machine learning models. It emphasizes user-friendliness, speed, and reliability, aiming to be accessible to the wider community. The toolbox has been rigorously tested on Linux OS and supports CUDA versions 10.3-11.7, with a stable branch for later versions.

Recent updates include support for LISA, a memory-efficient finetuning algorithm that allows training of 7B models in 24G memory without offloading, speculative decoding, long context inference with position interpolation for LLaMA models, and integration of Flash Attention-2. Additionally, LMFlow now supports Llama2, ChatGLM2, and Baichuan models.

LMFlow offers a variety of features for finetuning acceleration and memory optimization, such as LISA, LoRA, FlashAttention, gradient checkpointing, and Deepspeed Zero3. For inference acceleration, it supports LLaMA Inference on CPU and FlashAttention. It also provides long context support and model customization options, including vocabulary extension and multimodal chatbot capabilities.

The toolkit is available on PyPI for easy installation and comes with a comprehensive set of documentation and examples to get users started quickly. LMFlow is open-source, licensed under Apache 2.0, with commercial use requiring authorization. Users are encouraged to cite the relevant papers if they find LMFlow useful for their research or projects.
Read more at GitHub…