🧠 QLORA: Efficient Finetuning of Quantized Large Language Models
Impossible d'ajouter des articles
Échec de l’élimination de la liste d'envies.
Impossible de suivre le podcast
Impossible de ne plus suivre le podcast
-
Lu par :
-
De :
À propos de ce contenu audio
The research introduces QLORA, a novel method for efficient finetuning of large language models by quantising pretrained models to 4-bit and using Low-Rank Adapters. This approach drastically reduces memory usage, enabling the finetuning of models with up to 65 billion parameters on a single 48GB GPU while maintaining 16-bit finetuning performance. Key innovations include the 4-bit NormalFloat (NF4) data type, double quantisation, and paged optimisers to manage memory. Using QLORA, the authors developed Guanaco, a family of models that achieves competitive performance with ChatGPT on the Vicuna benchmark and demonstrates state-of-the-art chatbot capabilities. The paper also examines the importance of data quality over quantity in finetuning and provides an analysis of chatbot evaluation methods, including a comparison between human and GPT-4 assessments.
Vous êtes membre Amazon Prime ?
Bénéficiez automatiquement de 2 livres audio offerts.Bonne écoute !