Precision Mode#

This tutorial is available as an IPython notebook at Malaya/example/precision-mode.

Let say you want to run the model in FP16, or FP64.

[1]:
import malaya
import logging
logging.basicConfig(level = logging.INFO)

Use specific precision for specific model#

To do that, pass precision_mode parameter to any load model function in Malaya,

malaya.sentiment.transformer(model = 'albert', precision_mode = 'FP16')

Supported precision mode is {'BFLOAT16', 'FP16', 'FP32', 'FP64'}, default is FP32, check code at https://github.com/huseinzol05/malaya-boilerplate/blob/main/malaya_boilerplate/frozen_graph.py

[2]:
albert = malaya.sentiment.transformer(model = 'albert')
albert_fp16 = malaya.sentiment.transformer(model = 'albert', precision_mode = 'FP16')
INFO:root:running sentiment/albert using device /device:CPU:0
INFO:root:running sentiment/albert using device /device:CPU:0
Converting sentiment/albert to FP16.
[3]:
string = 'ketiak saya masam tapi saya comel'
[5]:
%%time

albert.predict_proba([string])
CPU times: user 166 ms, sys: 15.9 ms, total: 182 ms
Wall time: 47.1 ms
[5]:
[{'negative': 0.8387252, 'positive': 0.0016127465, 'neutral': 0.15966207}]
[7]:
%%time

albert_fp16.predict_proba([string])
CPU times: user 14.6 s, sys: 53.3 ms, total: 14.6 s
Wall time: 2.21 s
[7]:
[{'negative': 0.839, 'positive': 0.001611, 'neutral': 0.1597}]

Running on FP16 is not necessary faster, most CPUs are not optimized for FP16, might want to look into GPU RTX and above.