{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Different Precision" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "This tutorial is available as an IPython notebook at [Malaya/example/different-precision](https://github.com/huseinzol05/Malaya/tree/master/example/different-precision).\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Read more at https://huggingface.co/docs/diffusers/optimization/fp16#half-precision-weights" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 2.88 s, sys: 3.46 s, total: 6.34 s\n", "Wall time: 2.21 s\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/husein/dev/malaya/malaya/tokenizer.py:214: FutureWarning: Possible nested set at position 3397\n", " self.tok = re.compile(r'({})'.format('|'.join(pipeline)))\n", "/home/husein/dev/malaya/malaya/tokenizer.py:214: FutureWarning: Possible nested set at position 3927\n", " self.tok = re.compile(r'({})'.format('|'.join(pipeline)))\n" ] } ], "source": [ "%%time\n", "\n", "import malaya\n", "import logging\n", "logging.basicConfig(level = logging.INFO)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import torch" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# https://discuss.pytorch.org/t/finding-model-size/130275\n", "\n", "def get_model_size_mb(model):\n", " param_size = 0\n", " for param in model.model.parameters():\n", " param_size += param.nelement() * param.element_size()\n", " buffer_size = 0\n", " for buffer in model.model.buffers():\n", " buffer_size += buffer.nelement() * buffer.element_size()\n", " return (param_size + buffer_size) / 1024**2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load default precision, FP32" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Loading the tokenizer from the `special_tokens_map.json` and the `added_tokens.json` will be removed in `transformers 5`, it is kept for forward compatibility, but it is recommended to update your `tokenizer_config.json` by uploading it again. You will see the new `added_tokens_decoder` attribute that will store the relevant information.\n", "You are using the default legacy behaviour of the . If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565\n", "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n" ] } ], "source": [ "model = malaya.translation.huggingface(model = 'mesolitica/translation-t5-small-standard-bahasa-cased')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "230.765625" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_model_size_mb(model)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/husein/.local/lib/python3.8/site-packages/transformers/generation/utils.py:1260: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.\n", " warnings.warn(\n" ] }, { "data": { "text/plain": [ "['Saya suka ayam']" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.generate(['i like chicken'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load FP16\n", "\n", "**Only worked on GPU**." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n" ] } ], "source": [ "model = malaya.translation.huggingface(model = 'mesolitica/translation-t5-small-standard-bahasa-cased',\n", " torch_dtype=torch.float16)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "139.3828125" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_model_size_mb(model)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load INT8\n", "\n", "Required latest version `accelerate` and `bitsandbytes`,\n", "\n", "```bash\n", "pip3 install accelerate bitsandbytes\n", "```\n", "\n", "**Only worked on GPU**." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n" ] } ], "source": [ "model = malaya.translation.huggingface(model = 'mesolitica/translation-t5-small-standard-bahasa-cased',\n", " load_in_8bit=True, device_map='auto')" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "109.3828125" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_model_size_mb(model)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Saya suka ayam']" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.generate(['i like chicken'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load INT4\n", "\n", "Required latest version `accelerate` and `bitsandbytes`,\n", "\n", "```bash\n", "pip3 install accelerate bitsandbytes\n", "```\n", "\n", "**Only worked on GPU**." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n" ] } ], "source": [ "model = malaya.translation.huggingface(model = 'mesolitica/translation-t5-small-standard-bahasa-cased',\n", " load_in_4bit=True, device_map='auto')" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "94.3828125" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_model_size_mb(model)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Saya suka ayam']" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.generate(['i like chicken'])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 2 }