{ "cells": [ { "cell_type": "markdown", "id": "increased-pantyhose", "metadata": {}, "source": [ "# Quantization" ] }, { "cell_type": "markdown", "id": "magnetic-porter", "metadata": {}, "source": [ "
\n", "\n", "This tutorial is available as an IPython notebook at [Malaya/example/quantization](https://github.com/huseinzol05/Malaya/tree/master/example/quantization).\n", " \n", "
" ] }, { "cell_type": "markdown", "id": "outdoor-retailer", "metadata": {}, "source": [ "We provided Quantized model for all Malaya models, example, sentiment transformer models," ] }, { "cell_type": "code", "execution_count": 1, "id": "gothic-empty", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:root:tested on 20% test set.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Size (MB)Quantized Size (MB)macro precisionmacro recallmacro f1-score
bert425.6111.000.993300.993300.99329
tiny-bert57.415.400.987740.987740.98774
albert48.612.800.992270.992260.99226
tiny-albert22.45.980.985540.985500.98551
xlnet446.6118.000.993530.993530.99353
alxlnet46.813.300.991880.991880.99188
\n", "
" ], "text/plain": [ " Size (MB) Quantized Size (MB) macro precision macro recall \\\n", "bert 425.6 111.00 0.99330 0.99330 \n", "tiny-bert 57.4 15.40 0.98774 0.98774 \n", "albert 48.6 12.80 0.99227 0.99226 \n", "tiny-albert 22.4 5.98 0.98554 0.98550 \n", "xlnet 446.6 118.00 0.99353 0.99353 \n", "alxlnet 46.8 13.30 0.99188 0.99188 \n", "\n", " macro f1-score \n", "bert 0.99329 \n", "tiny-bert 0.98774 \n", "albert 0.99226 \n", "tiny-albert 0.98551 \n", "xlnet 0.99353 \n", "alxlnet 0.99188 " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import malaya\n", "\n", "malaya.sentiment.available_transformer()" ] }, { "cell_type": "markdown", "id": "positive-bracelet", "metadata": {}, "source": [ "Usually quantized model able to compress 4x of original size. This quantized model will convert all possible floating constants to quantized constants, and only stored mean, standard deviation of floating constants and quantized constants.\n", "\n", "Again, quantized model is not necessary faster, because tensorflow will cast back to FP32 during feed-forward for certain operations." ] }, { "cell_type": "markdown", "id": "functioning-youth", "metadata": {}, "source": [ "### Use quantized model\n", "\n", "Simply pass `quantized` parameter become `True`, default is `False`." ] }, { "cell_type": "code", "execution_count": 2, "id": "better-religious", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING:root:Load quantized model will cause accuracy drop.\n", "INFO:root:running sentiment/albert-quantized using device /device:CPU:0\n", "INFO:root:running sentiment/albert using device /device:CPU:0\n" ] } ], "source": [ "albert_quantized = malaya.sentiment.transformer(model = 'albert', quantized = True)\n", "albert = malaya.sentiment.transformer(model = 'albert')" ] }, { "cell_type": "code", "execution_count": 3, "id": "imposed-uganda", "metadata": {}, "outputs": [], "source": [ "string = 'saya masam awak pun masam'" ] }, { "cell_type": "code", "execution_count": 5, "id": "dietary-lottery", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 171 ms, sys: 15.9 ms, total: 187 ms\n", "Wall time: 47.2 ms\n" ] }, { "data": { "text/plain": [ "['negative']" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%time\n", "\n", "albert.predict([string])" ] }, { "cell_type": "code", "execution_count": 9, "id": "trying-capability", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 181 ms, sys: 41.1 ms, total: 223 ms\n", "Wall time: 53.8 ms\n" ] }, { "data": { "text/plain": [ "['negative']" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%time\n", "\n", "albert_quantized.predict([string])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 5 }