Different Precision PyTorch#

This tutorial is available as an IPython notebook at Malaya/example/different-precision-pytorch.

Read more at https://huggingface.co/docs/diffusers/optimization/fp16#half-precision-weights

[1]:
%%time

import malaya
import logging
logging.basicConfig(level = logging.INFO)
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2022-11-10 11:59:24.872708: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
/home/husein/tf-nvidia/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
/home/husein/dev/malaya/malaya/tokenizer.py:208: FutureWarning: Possible nested set at position 3372
  self.tok = re.compile(r'({})'.format('|'.join(pipeline)))
/home/husein/dev/malaya/malaya/tokenizer.py:208: FutureWarning: Possible nested set at position 3890
  self.tok = re.compile(r'({})'.format('|'.join(pipeline)))
CPU times: user 2.85 s, sys: 3.66 s, total: 6.52 s
Wall time: 1.96 s
[2]:
import torch
[3]:
# https://discuss.pytorch.org/t/finding-model-size/130275

def get_model_size_mb(model):
    param_size = 0
    for param in model.model.parameters():
        param_size += param.nelement() * param.element_size()
    buffer_size = 0
    for buffer in model.model.buffers():
        buffer_size += buffer.nelement() * buffer.element_size()
    return (param_size + buffer_size) / 1024**2

Load default precision, FP32#

[4]:
model = malaya.translation.en_ms.huggingface(model = 'mesolitica/finetune-noisy-translation-t5-small-bahasa-cased')
[5]:
get_model_size_mb(model)
[5]:
230.759765625
[6]:
model.generate(['i like chicken'])
[6]:
['saya suka ayam']

Load FP16#

Only worked on GPU.

[8]:
model = malaya.translation.en_ms.huggingface(model = 'mesolitica/finetune-noisy-translation-t5-small-bahasa-cased',
                                            torch_dtype=torch.float16)
[9]:
get_model_size_mb(model)
[9]:
115.3798828125

Load INT8#

Required latest version accelerate and bitsandbytes,

pip3 install accelerate bitsandbytes

Only worked on GPU.

[12]:
model = malaya.translation.en_ms.huggingface(model = 'mesolitica/finetune-noisy-translation-t5-small-bahasa-cased',
                                            load_in_8bit=True, device_map='auto')