Tremendous Tuning LLAMAv2 with QLora on Google Colab for Free

Fine Tuning LLAMAv2 with QLora on Google Colab for Free

Generated utilizing ideogram.ai with the immediate: “A photograph of LLAMA with the banner written “QLora” on it., 3d render, wildlife images”

It was a dream to fine-tune a 7B mannequin on a single GPU at no cost on Google Colab till just lately. On 23 Could 2023, Tim Dettmers and his staff submitted a revolutionary paper[1] on fine-tuning Quantized Massive Language Fashions.

A Quantized mannequin is a mannequin that has its weights in an information kind that’s decrease than the info kind on which it was skilled. For instance, in case you prepare a mannequin in a 32-bit floating level, after which convert these weights to a decrease information kind akin to 16/8/4 bit floating level such that there’s minimal to no impact on the efficiency of the mannequin.

Supply [2]

We’re not going to speak a lot concerning the idea of quantization right here, You’ll be able to seek advice from the superb weblog put up by Hugging-Face[2][3] and a very good YouTube video[4] by Tim Dettmers himself to grasp the underlying idea.

Briefly, it may be mentioned that QLora means:

Tremendous-Tuning a Quantized Massive Language fashions utilizing Low Rank Adaptation Matrices (LoRA)[5]

Let’s bounce straight into the code:

You will need to perceive that the big language fashions are designed to take directions, this was first launched within the 2021 ACL paper[6]. The thought is easy, we give a language mannequin an instruction, and it follows the instruction and performs that activity. So the dataset that we wish to fine-tune our mannequin must be within the instruct format, if not we will convert it.

One of many widespread codecs is the instruct format. We shall be utilizing the Alpaca Immediate Template[7] which is

Under is an instruction that describes a activity, paired with an enter that gives additional context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Enter:
{enter}

### Response:
{response}

We shall be utilizing the SNLI dataset which is a dataset that has 2 sentences and the connection between them whether or not they’re contradiction, entailment of one another, or impartial. We shall be utilizing it to generate contradiction for a sentence utilizing LLAMAv2. We will load this dataset merely utilizing pandas.

import pandas as pd

df = pd.read_csv('snli_1.0_train_matched.csv')
df['gold_label'].value_counts().plot(variety='barh')

Labels Distribution

We will see a couple of random contradiction examples right here.

df[df['gold_label'] == 'contradiction'].pattern(10)[['sentence1', 'sentence2']]

Contradiction Examples from SNLI

Now we will create a small operate that takes solely the contradictory sentences and converts the dataset instruct format.

def convert_to_format(row):
    sentence1 = row['sentence1']
    sentence2 = row['sentence2']ccccc
    immediate = """Under is an instruction that describes a activity paired with enter that gives additional context. Write a response that appropriately completes the request."""
    instruction = """Given the next sentence, your job is to generate the negation for it within the json format"""
    enter = str(sentence1)
    response = f"""```json
{{'orignal_sentence': '{sentence1}', 'generated_negation': '{sentence2}'}}
```
"""
    if len(enter.strip()) == 0:  #  immediate + 2 new strains + ###instruction + new line + enter + new line + ###response
        textual content = immediate + "nn### Instruction:n" + instruction + "n### Response:n" + response
    else:
        textual content = immediate + "nn### Instruction:n" + instruction + "n### Enter:n" + enter + "n" + "n### Response:n" + response
    
    # we'd like 4 columns for auto prepare, instruction, enter, output, textual content
    return pd.Sequence([instruction, input, response, text])

new_df = df[df['gold_label'] == 'contradiction'][['sentence1', 'sentence2']].apply(convert_to_format, axis=1)
new_df.columns = ['instruction', 'input', 'output', 'text']

new_df.to_csv('snli_instruct.csv', index=False)

Right here is an instance of the pattern information level:

"Under is an instruction that describes a activity paired with enter that gives additional context. Write a response that appropriately completes the request.

### Instruction:
Given the next sentence, your job is to generate the negation for it within the json format
### Enter:
A pair taking part in with a little bit boy on the seashore.

### Response:
```json
{'orignal_sentence': 'A pair taking part in with a little bit boy on the seashore.', 'generated_negation': 'A pair watch a little bit lady play by herself on the seashore.'}
```

Now now we have our dataset within the right format, let’s begin with fine-tuning. Earlier than beginning it, let’s set up the required packages. We shall be utilizing speed up, peft (Parameter environment friendly Tremendous Tuning), mixed with Hugging Face Bits and bytes and transformers.

!pip set up -q speed up==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

You’ll be able to add the formatted dataset to the drive and cargo it within the Colab.

from google.colab import drive
import pandas as pd

drive.mount('/content material/drive')

df = pd.read_csv('/content material/drive/MyDrive/snli_instruct.csv')

You’ll be able to convert it to the Hugging Face dataset format simply utilizing from_pandas methodology, this shall be useful in coaching the mannequin.

from datasets import Dataset

dataset = Dataset.from_pandas(df)

We shall be utilizing the already quantized LLamav2 mannequin which is offered by abhishek/llama-2–7b-hf-small-shards. Let’s outline some hyperparameters and variables right here:

# The mannequin that you just wish to prepare from the Hugging Face hub
model_name = "abhishek/llama-2-7b-hf-small-shards"

# Tremendous-tuned mannequin identify
new_model = "llama-2-contradictor"

################################################################################
# QLoRA parameters
################################################################################

# LoRA consideration dimension
lora_r = 64

# Alpha parameter for LoRA scaling
lora_alpha = 16

# Dropout chance for LoRA layers
lora_dropout = 0.1

################################################################################
# bitsandbytes parameters
################################################################################

# Activate 4-bit precision base mannequin loading
use_4bit = True

# Compute dtype for 4-bit base fashions
bnb_4bit_compute_dtype = "float16"

# Quantization kind (fp4 or nf4)
bnb_4bit_quant_type = "nf4"

# Activate nested quantization for 4-bit base fashions (double quantization)
use_nested_quant = False

################################################################################
# TrainingArguments parameters
################################################################################

# Output listing the place the mannequin predictions and checkpoints shall be saved
output_dir = "./outcomes"

# Variety of coaching epochs
num_train_epochs = 1

# Allow fp16/bf16 coaching (set bf16 to True with an A100)
fp16 = False
bf16 = False

# Batch dimension per GPU for coaching
per_device_train_batch_size = 4

# Batch dimension per GPU for analysis
per_device_eval_batch_size = 4

# Variety of replace steps to build up the gradients for
gradient_accumulation_steps = 1

# Allow gradient checkpointing
gradient_checkpointing = True

# Most gradient regular (gradient clipping)
max_grad_norm = 0.3

# Preliminary studying charge (AdamW optimizer)
learning_rate = 1e-5

# Weight decay to use to all layers besides bias/LayerNorm weights
weight_decay = 0.001

# Optimizer to make use of
optim = "paged_adamw_32bit"

# Studying charge schedule
lr_scheduler_type = "cosine"

# Variety of coaching steps (overrides num_train_epochs)
max_steps = -1

# Ratio of steps for a linear warmup (from 0 to studying charge)
warmup_ratio = 0.03

# Group sequences into batches with identical size
# Saves reminiscence and quickens coaching significantly
group_by_length = True

# Save checkpoint each X updates steps
save_steps = 0

# Log each X updates steps
logging_steps = 100

################################################################################
# SFT parameters
################################################################################

# Most sequence size to make use of
max_seq_length = None

# Pack a number of brief examples in the identical enter sequence to extend effectivity
packing = False

# Load your entire mannequin on the GPU 0
device_map = {"": 0}

Most of those are fairly simple hyper-parameters having these default values. You’ll be able to all the time seek advice from the documentation for extra particulars.

We will now merely use BitsAndBytesConfig class to create the config for 4-bit fine-tuning.

compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

Now we will load the bottom mannequin with 4 bit BitsAndBytesConfig and tokenizer for Tremendous-Tuning.

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "proper"

mannequin = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map
)
mannequin.config.use_cache = False
mannequin.config.pretraining_tp = 1

We will now create the LoRA config and set the coaching parameters.

# Load LoRA configuration
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)

# Set coaching parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard"
)

Now we will merely use SFTTrainer which is offered by trl from HuggingFace to begin the coaching.

# Set supervised fine-tuning parameters
coach = SFTTrainer(
    mannequin=mannequin,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="textual content",  # that is the textual content column in dataset 
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

# Prepare mannequin
coach.prepare()

# Save skilled mannequin
coach.mannequin.save_pretrained(new_model)

This can begin the coaching for the variety of epochs you may have set above. As soon as the mannequin is skilled, make certain to put it aside within the drive to be able to load it once more (as it’s a must to restart the session within the colab). You’ll be able to retailer the mannequin within the drive through zip and mv command.

!zip -r llama-contradictor.zip outcomes llama-contradictor
!mv llama-contradictor.zip /content material/drive/MyDrive

Now while you restart the Colab session, you possibly can transfer it again to your session once more.

!unzip /content material/drive/MyDrive/llama-contradictor.zip -d .

You want to load the bottom mannequin once more and merge it with the fine-tuned LoRA matrices. This may be achieved utilizing merge_and_unload() operate.

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "proper"

base_model = AutoModelForCausalLM.from_pretrained(
    "abhishek/llama-2-7b-hf-small-shards",
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map={"": 0},
)

mannequin = PeftModel.from_pretrained(base_model, '/content material/llama-contradictor')
mannequin = mannequin.merge_and_unload()
pipe = pipeline(activity="text-generation", mannequin=mannequin, tokenizer=tokenizer, max_length=200)

You’ll be able to take a look at your mannequin by merely passing within the inputs in the identical immediate template that now we have outlined above.

prompt_template = """### Instruction:
Given the next sentence, your job is to generate the negation for it within the json format
### Enter:
{}

### Response:
"""

sentence = "The climate forecast predicts a sunny day with a excessive temperature round 30 levels Celsius, excellent for a day on the seashore with family and friends."

input_sentence = prompt_template.format(sentence.strip())

end result = pipe(input_sentence)
print(end result)

### Instruction:
Given the next sentence, your job is to generate the negation for it within the json format
### Enter:
The climate forecast predicts a sunny day with a excessive temperature round 30 levels Celsius, excellent for a day on the seashore with family and friends.

### Response:
```json
{
  "sentence": "The climate forecast predicts a sunny day with a excessive temperature round 30 levels Celsius, excellent for a day on the seashore with family and friends.",
  "negation": "The climate forecast predicts a wet day with a low temperature round 10 levels Celsius, not preferrred for a day on the seashore with family and friends."
}
```

There shall be many occasions when the mannequin will carry on predicting even after the response is generated as a result of token restrict. On this case, it is advisable to add a post-processing operate that filters the JSON half which is what we’d like. This may be achieved utilizing a easy Regex.

import re
import json

def format_results(s):
  sample = r'```jsonn(.*?)n```'

  # Discover all occurrences of JSON objects within the string
  json_matches = re.findall(sample, s, re.DOTALL)
  if not json_matches:
    # attempt to discover 2nd sample
    sample = r'{.*?"sentence":.*?"negation":.*?}'
    json_matches = re.findall(sample, s)

  # Return the primary JSON object discovered, or None if no match is discovered
  return json.masses(json_matches[0]) if json_matches else None

This gives you the required output as a substitute of the mannequin repeating random output tokens.

On this weblog, you discovered the fundamentals of QLora, fine-tuning a LLama v2 mannequin on Colab utilizing QLora, Instruction Tuning, and a pattern template from the Alpaca dataset that can be utilized to instruct tune a mannequin additional.

References

[1]: QLoRA: Environment friendly Finetuning of Quantized LLMs, 23 Could 2023, Tim Dettmers et al.

[2]: https://huggingface.co/weblog/hf-bitsandbytes-integration

[3]: https://huggingface.co/weblog/4bit-transformers-bitsandbytes

[4]: https://www.youtube.com/watch?v=y9PHWGOa8HA

[5]: https://arxiv.org/abs/2106.09685

[6]: https://aclanthology.org/2022.acl-long.244/

[7]: https://crfm.stanford.edu/2023/03/13/alpaca.html

[8]: Colab Pocket book by @maximelabonne https://colab.analysis.google.com/drive/1PEQyJO1-f6j0S_XJ8DV50NkpzasXkrzd?usp=sharing

Ahmad Anis is a passionate Machine Studying Engineer and Researcher at present working at redbuffer.ai. Past his day job, Ahmad actively engages with the Machine Studying group. He serves as a regional lead for Cohere for AI, a nonprofit devoted to open science, and is an AWS Neighborhood Builder. Ahmad is an lively contributor at Stackoverflow, the place he has 2300+ factors. He has contributed to many well-known open-source initiatives, together with Shap-E by OpenAI.

Tremendous Tuning LLAMAv2 with QLora on Google Colab for Free

References

Related Articles

Progressive in enormous settlement with policyholders

Navigating present developments in inland marine insurance coverage

Operating the numbers: Ardour for math results in insurance coverage profession

LEAVE A REPLY Cancel reply

Latest Articles

Progressive in enormous settlement with policyholders

Navigating present developments in inland marine insurance coverage

Operating the numbers: Ardour for math results in insurance coverage profession

This Yr, So Far | June 2024 E-newsletter

Howden casts highlight on cyber insurance coverage alternatives

ABOUT US