Precision Mode
Contents
Precision Mode#
This tutorial is available as an IPython notebook at Malaya/example/precision-mode.
Let say you want to run the model in FP16, or FP64.
[1]:
import malaya
import logging
logging.basicConfig(level = logging.INFO)
Use specific precision for specific model#
To do that, pass precision_mode
parameter to any load model function in Malaya,
malaya.sentiment.transformer(model = 'albert', precision_mode = 'FP16')
Supported precision mode is {'BFLOAT16', 'FP16', 'FP32', 'FP64'}
, default is FP32
, check code at https://github.com/huseinzol05/malaya-boilerplate/blob/main/malaya_boilerplate/frozen_graph.py
[2]:
albert = malaya.sentiment.transformer(model = 'albert')
albert_fp16 = malaya.sentiment.transformer(model = 'albert', precision_mode = 'FP16')
INFO:root:running sentiment/albert using device /device:CPU:0
INFO:root:running sentiment/albert using device /device:CPU:0
Converting sentiment/albert to FP16.
[3]:
string = 'ketiak saya masam tapi saya comel'
[5]:
%%time
albert.predict_proba([string])
CPU times: user 166 ms, sys: 15.9 ms, total: 182 ms
Wall time: 47.1 ms
[5]:
[{'negative': 0.8387252, 'positive': 0.0016127465, 'neutral': 0.15966207}]
[7]:
%%time
albert_fp16.predict_proba([string])
CPU times: user 14.6 s, sys: 53.3 ms, total: 14.6 s
Wall time: 2.21 s
[7]:
[{'negative': 0.839, 'positive': 0.001611, 'neutral': 0.1597}]
Running on FP16 is not necessary faster, most CPUs are not optimized for FP16, might want to look into GPU RTX and above.