GPU Environment
Contents
GPU Environment#
This tutorial is available as an IPython notebook at Malaya/example/gpu-environment.
[1]:
%%time
import malaya
import logging
logging.basicConfig(level = logging.INFO)
CPU times: user 6.33 s, sys: 2.5 s, total: 8.83 s
Wall time: 4.65 s
List available GPU#
[2]:
malaya.utils.available_gpu()
[2]:
[('GPU:0', '29.16 GB'),
('GPU:1', '29.163 GB'),
('GPU:2', '29.163 GB'),
('GPU:3', '29.163 GB')]
Limit GPU memory#
By default Malaya will not set max cap for GPU memory, to put a cap, override gpu_limit
parameter in any load model API. gpu_limit
should 0 < gpu_limit
< 1. If gpu_limit = 0.3
, it means the model will not use more than 30% of GPU memory.
malaya.sentiment.transformer(gpu_limit = 0.3)
Not all operations supported by GPU#
Yes, some models might faster in CPU due to head cost transitioning from CPU to GPU too frequently, for example, transformer model from T2T library.
N Models to N gpus#
To allocate a model to another GPU, set device
to different GPU, eg, GPU:1
, default is GPU:0
.
model_sentiment = malaya.sentiment.transformer(model = 'bert', gpu_limit = 0.5, device = 'GPU:0')
model_subjectivity = malaya.subjectivity.transformer(model = 'bert', gpu_limit = 0.5, device = 'GPU:1')
model_emotion = malaya.emotion.transformer(model = 'bert', gpu_limit = 0.5, device = 'GPU:2')
model_translation = malaya.translation.ms_en.transformer(gpu_limit = 0.5, device = 'GPU:3')
GPU Rules#
Malaya will not consumed all available GPU memory, but slowly grow based on batch size. This growth only towards positive (use more GPU memory) dynamically, but will not reduce GPU memory if feed small batch size.
Use
malaya.utils.close_session
to clear session from unused models but this will not free GPU memory.
[5]:
anger_text = 'babi la company ni, aku dah la penat datang dari jauh'
fear_text = 'takut doh tengok cerita hantu tadi'
happy_text = 'bestnya dapat tidur harini, tak payah pergi kerja'
love_text = 'aku sayang sgt dia dah doh'
sadness_text = 'kecewa tengok kerajaan baru ni, janji ape pun tak dapat'
surprise_text = 'sakit jantung aku, terkejut dengan cerita hantu tadi'
[6]:
model_sentiment = malaya.sentiment.transformer(model = 'bert', gpu_limit = 0.5, device = 'GPU:0')
model_subjectivity = malaya.subjectivity.transformer(model = 'bert', gpu_limit = 0.5, device = 'GPU:1')
model_emotion = malaya.emotion.transformer(model = 'bert', gpu_limit = 0.5, device = 'GPU:2')
model_translation = malaya.translation.ms_en.transformer(gpu_limit = 0.5, device = 'GPU:3')
WARNING:tensorflow:From /home/husein/malaya/Malaya/malaya/function/__init__.py:73: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.
WARNING:tensorflow:From /home/husein/malaya/Malaya/malaya/function/__init__.py:75: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.
WARNING:tensorflow:From /home/husein/malaya/Malaya/malaya/function/__init__.py:50: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From /home/husein/malaya/Malaya/malaya/function/__init__.py:65: The name tf.InteractiveSession is deprecated. Please use tf.compat.v1.InteractiveSession instead.
[7]:
%%time
model_sentiment.predict_proba(
[anger_text, fear_text, happy_text, love_text, sadness_text, surprise_text]
)
model_subjectivity.predict_proba(
[anger_text, fear_text, happy_text, love_text, sadness_text, surprise_text]
)
model_emotion.predict_proba(
[anger_text, fear_text, happy_text, love_text, sadness_text, surprise_text]
)
model_translation.translate(['Mahathir buat keputusan terburu-buru'])
CPU times: user 8.61 s, sys: 2.71 s, total: 11.3 s
Wall time: 10.8 s
[7]:
['Mahathir made a hasty decision']
[8]:
!nvidia-smi
Sun Jul 12 19:26:18 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.129 Driver Version: 410.129 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-DGXS... On | 00000000:07:00.0 On | 0 |
| N/A 45C P0 54W / 300W | 1101MiB / 32475MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-DGXS... On | 00000000:08:00.0 Off | 0 |
| N/A 46C P0 52W / 300W | 1100MiB / 32478MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-DGXS... On | 00000000:0E:00.0 Off | 0 |
| N/A 45C P0 52W / 300W | 1100MiB / 32478MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-DGXS... On | 00000000:0F:00.0 Off | 0 |
| N/A 45C P0 53W / 300W | 1100MiB / 32478MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 12786 C /usr/bin/python3 1089MiB |
| 1 12786 C /usr/bin/python3 1089MiB |
| 2 12786 C /usr/bin/python3 1089MiB |
| 3 12786 C /usr/bin/python3 1089MiB |
+-----------------------------------------------------------------------------+