Different Precision PyTorch
Contents
Different Precision PyTorch#
This tutorial is available as an IPython notebook at Malaya/example/different-precision-pytorch.
Read more at https://huggingface.co/docs/diffusers/optimization/fp16#half-precision-weights
[1]:
%%time
import malaya
import logging
logging.basicConfig(level = logging.INFO)
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2022-11-10 11:59:24.872708: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
/home/husein/tf-nvidia/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
/home/husein/dev/malaya/malaya/tokenizer.py:208: FutureWarning: Possible nested set at position 3372
self.tok = re.compile(r'({})'.format('|'.join(pipeline)))
/home/husein/dev/malaya/malaya/tokenizer.py:208: FutureWarning: Possible nested set at position 3890
self.tok = re.compile(r'({})'.format('|'.join(pipeline)))
CPU times: user 2.85 s, sys: 3.66 s, total: 6.52 s
Wall time: 1.96 s
[2]:
import torch
[3]:
# https://discuss.pytorch.org/t/finding-model-size/130275
def get_model_size_mb(model):
param_size = 0
for param in model.model.parameters():
param_size += param.nelement() * param.element_size()
buffer_size = 0
for buffer in model.model.buffers():
buffer_size += buffer.nelement() * buffer.element_size()
return (param_size + buffer_size) / 1024**2
Load default precision, FP32#
[4]:
model = malaya.translation.en_ms.huggingface(model = 'mesolitica/finetune-noisy-translation-t5-small-bahasa-cased')
[5]:
get_model_size_mb(model)
[5]:
230.759765625
[6]:
model.generate(['i like chicken'])
[6]:
['saya suka ayam']
Load FP16#
Only worked on GPU.
[8]:
model = malaya.translation.en_ms.huggingface(model = 'mesolitica/finetune-noisy-translation-t5-small-bahasa-cased',
torch_dtype=torch.float16)
[9]:
get_model_size_mb(model)
[9]:
115.3798828125
Load INT8#
Required latest version accelerate
and bitsandbytes
,
pip3 install accelerate bitsandbytes
Only worked on GPU.
[12]:
model = malaya.translation.en_ms.huggingface(model = 'mesolitica/finetune-noisy-translation-t5-small-bahasa-cased',
load_in_8bit=True, device_map='auto')