Different Precision#

This tutorial is available as an IPython notebook at Malaya/example/different-precision.

[1]:

%%time

import malaya
import logging
logging.basicConfig(level = logging.INFO)

CPU times: user 2.88 s, sys: 3.46 s, total: 6.34 s
Wall time: 2.21 s

/home/husein/dev/malaya/malaya/tokenizer.py:214: FutureWarning: Possible nested set at position 3397
  self.tok = re.compile(r'({})'.format('|'.join(pipeline)))
/home/husein/dev/malaya/malaya/tokenizer.py:214: FutureWarning: Possible nested set at position 3927
  self.tok = re.compile(r'({})'.format('|'.join(pipeline)))

[2]:

import torch

[3]:

# https://discuss.pytorch.org/t/finding-model-size/130275

def get_model_size_mb(model):
    param_size = 0
    for param in model.model.parameters():
        param_size += param.nelement() * param.element_size()
    buffer_size = 0
    for buffer in model.model.buffers():
        buffer_size += buffer.nelement() * buffer.element_size()
    return (param_size + buffer_size) / 1024**2

Load default precision, FP32#

[5]:

model = malaya.translation.huggingface(model = 'mesolitica/translation-t5-small-standard-bahasa-cased')

Loading the tokenizer from the `special_tokens_map.json` and the `added_tokens.json` will be removed in `transformers 5`,  it is kept for forward compatibility, but it is recommended to update your `tokenizer_config.json` by uploading it again. You will see the new `added_tokens_decoder` attribute that will store the relevant information.
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

[6]:

get_model_size_mb(model)

[6]:

230.765625

[7]:

model.generate(['i like chicken'])

/home/husein/.local/lib/python3.8/site-packages/transformers/generation/utils.py:1260: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(

[7]:

['Saya suka ayam']

Load FP16#

Only worked on GPU.

[9]:

model = malaya.translation.huggingface(model = 'mesolitica/translation-t5-small-standard-bahasa-cased',
                                            torch_dtype=torch.float16)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

[10]:

get_model_size_mb(model)

[10]:

139.3828125

Load INT8#

Required latest version accelerate and bitsandbytes,

pip3 install accelerate bitsandbytes

Only worked on GPU.

[12]:

model = malaya.translation.huggingface(model = 'mesolitica/translation-t5-small-standard-bahasa-cased',
                                            load_in_8bit=True, device_map='auto')

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

[13]:

get_model_size_mb(model)

[13]:

109.3828125

[14]:

model.generate(['i like chicken'])

[14]:

['Saya suka ayam']

Load INT4#

Required latest version accelerate and bitsandbytes,

pip3 install accelerate bitsandbytes

Only worked on GPU.

[15]:

model = malaya.translation.huggingface(model = 'mesolitica/translation-t5-small-standard-bahasa-cased',
                                            load_in_4bit=True, device_map='auto')

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

[16]:

get_model_size_mb(model)

[16]:

94.3828125

[17]:

model.generate(['i like chicken'])

[17]:

['Saya suka ayam']

Different Precision

Contents

Different Precision#

Load default precision, FP32#

Load FP16#

Load INT8#

Load INT4#