GPU Environment Tensorflow
Contents
GPU Environment Tensorflow#
This tutorial is available as an IPython notebook at Malaya/example/gpu-environment-tensorflow.
[1]:
import os
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
[2]:
%%time
import malaya
import logging
logging.basicConfig(level = logging.INFO)
CPU times: user 4.06 s, sys: 3.48 s, total: 7.54 s
Wall time: 4.07 s
List available GPU#
You must install Tensorflow GPU version first to enable GPU hardware acceleration.
[3]:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
2023-02-09 10:59:50.394171: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-09 10:59:50.676494: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 10:59:50.676776: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 10:59:50.689073: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 10:59:50.689336: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 10:59:50.689542: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 10:59:50.689706: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 10:59:52.518454: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 10:59:52.518763: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 10:59:52.519050: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 10:59:52.519236: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 10:59:52.519436: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 10:59:52.527725: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2023-02-09 10:59:52.527766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /device:GPU:0 with 1190 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2023-02-09 10:59:52.529121: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 10:59:52.529492: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2023-02-09 10:59:52.529525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /device:GPU:1 with 2299 MB memory: -> device: 1, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:07:00.0, compute capability: 8.6
[3]:
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 17455430732188064581,
name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 1248460800
locality {
bus_id: 1
links {
}
}
incarnation: 17187613631775953589
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6",
name: "/device:GPU:1"
device_type: "GPU"
memory_limit: 2411331584
locality {
bus_id: 1
links {
}
}
incarnation: 18149323579634163815
physical_device_desc: "device: 1, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:07:00.0, compute capability: 8.6"]
Run model inside GPU#
We can follow steps from here https://www.tensorflow.org/guide/gpu
[4]:
import tensorflow as tf
tf.debugging.set_log_device_placement(True)
[5]:
malaya.sentiment.available_transformer()
INFO:malaya.sentiment:tested on test set at https://github.com/huseinzol05/malay-dataset/tree/master/sentiment/semisupervised-twitter-3class
[5]:
Size (MB) | Quantized Size (MB) | macro precision | macro recall | macro f1-score | |
---|---|---|---|---|---|
bert | 425.6 | 111.00 | 0.93182 | 0.93442 | 0.93307 |
tiny-bert | 57.4 | 15.40 | 0.93390 | 0.93141 | 0.93262 |
albert | 48.6 | 12.80 | 0.91228 | 0.91929 | 0.91540 |
tiny-albert | 22.4 | 5.98 | 0.91442 | 0.91646 | 0.91521 |
xlnet | 446.6 | 118.00 | 0.92390 | 0.92629 | 0.92444 |
alxlnet | 46.8 | 13.30 | 0.91896 | 0.92589 | 0.92198 |
Malaya frozen graph interfaces#
load graph#
All the malaya tensorflow model interface will pass vector arguments to malaya_boilerplate.frozen_graph.load_graph
,
def load_graph(package, frozen_graph_filename, **kwargs):
"""
Load frozen graph from a checkpoint.
Parameters
----------
frozen_graph_filename: str
precision_mode: str, optional (default='FP32')
change precision frozen graph, only supported one of ['BFLOAT16', 'FP16', 'FP32', 'FP64'].
device: str, optional (default='CPU:0')
device to use for specific model, read more at https://www.tensorflow.org/guide/gpu
Returns
-------
result : tensorflow.Graph
"""
generate session#
After get load into the graph, it will pass the graph into malaya_boilerplate.frozen_graph.generate_session
to generate session for Tensorflow graph,
def generate_session(graph, **kwargs):
"""
Load session for a Tensorflow graph.
Parameters
----------
graph: tensorflow.Graph
gpu_limit: float, optional (default = 0.999)
limit percentage to use a gpu memory.
Returns
-------
result : tensorflow.Session
"""
[6]:
tiny_albert = malaya.sentiment.transformer(model = 'tiny-albert', device = 'GPU:0')
2022-10-22 22:50:23.644489: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:23.645216: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:23.645807: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:23.646643: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:23.647257: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:23.647868: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:23.648500: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:23.649080: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:23.649400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 17248 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2022-10-22 22:50:26.837274: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:26.837995: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:26.838587: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:26.839201: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:26.839795: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:26.840379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /device:GPU:0 with 17248 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2022-10-22 22:50:27.023275: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:27.024056: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:27.024655: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:27.025307: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:27.025894: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-22 22:50:27.026483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 17248 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
[7]:
tiny_albert.predict_proba(['hello'])
2022-10-22 22:50:27.931657: I tensorflow/stream_executor/cuda/cuda_blas.cc:1760] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
[7]:
[{'negative': 0.001145982, 'neutral': 0.99809974, 'positive': 0.0007543671}]