GitHub – artidoro/qlora: QLoRA: Efficient Finetuning of Quantized LLMs


GPT-4: QLoRA is an efficient finetuning approach that enables training a 65B parameter model on a single 48GB GPU while maintaining full 16-bit task performance. It introduces innovations such as 4-bit NormalFloat, Double Quantization, and Paged Optimizers to save memory without sacrificing performance. The Guanaco model family, developed using QLoRA, outperforms previous openly released models on the Vicuna benchmark. The approach allows for detailed analysis of instruction following and chatbot performance across various datasets, model types, and scales, leading to state-of-the-art results even with smaller models.
Read more at GitHub…