{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Isi Penting Generator HuggingFace product description style" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generate a long text with product description style given isi penting (important facts)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "This tutorial is available as an IPython notebook at [Malaya/example/isi-penting-generator-huggingface-product-description-style](https://github.com/huseinzol05/Malaya/tree/master/example/isi-penting-generator-huggingface-product-description-style).\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Results generated using stochastic methods.\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/husein/dev/malaya/malaya/tokenizer.py:208: FutureWarning: Possible nested set at position 3386\n", " self.tok = re.compile(r'({})'.format('|'.join(pipeline)))\n", "/home/husein/dev/malaya/malaya/tokenizer.py:208: FutureWarning: Possible nested set at position 3904\n", " self.tok = re.compile(r'({})'.format('|'.join(pipeline)))\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 3.34 s, sys: 3.29 s, total: 6.63 s\n", "Wall time: 2.57 s\n" ] } ], "source": [ "%%time\n", "import malaya\n", "from pprint import pprint" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### List available HuggingFace" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Size (MB)ROUGE-1ROUGE-2ROUGE-LSuggested length
mesolitica/finetune-isi-penting-generator-t5-small-standard-bahasa-cased242.00.2462030.0589610.151591024.0
mesolitica/finetune-isi-penting-generator-t5-base-standard-bahasa-cased892.00.2462030.0589610.151591024.0
\n", "
" ], "text/plain": [ " Size (MB) ROUGE-1 \\\n", "mesolitica/finetune-isi-penting-generator-t5-sm... 242.0 0.246203 \n", "mesolitica/finetune-isi-penting-generator-t5-ba... 892.0 0.246203 \n", "\n", " ROUGE-2 ROUGE-L \\\n", "mesolitica/finetune-isi-penting-generator-t5-sm... 0.058961 0.15159 \n", "mesolitica/finetune-isi-penting-generator-t5-ba... 0.058961 0.15159 \n", "\n", " Suggested length \n", "mesolitica/finetune-isi-penting-generator-t5-sm... 1024.0 \n", "mesolitica/finetune-isi-penting-generator-t5-ba... 1024.0 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "malaya.generator.isi_penting.available_huggingface()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load HuggingFace\n", "\n", "Transformer Generator in Malaya is quite unique, most of the text generative model we found on the internet like GPT2 or Markov, simply just continue prefix input from user, but not for Transformer Generator. We want to generate an article or karangan like high school when the users give 'isi penting'.\n", "\n", "```python\n", "def huggingface(model: str = 'mesolitica/finetune-isi-penting-generator-t5-base-standard-bahasa-cased', **kwargs):\n", " \"\"\"\n", " Load HuggingFace model to generate text based on isi penting.\n", "\n", " Parameters\n", " ----------\n", " model: str, optional (default='mesolitica/finetune-isi-penting-generator-t5-base-standard-bahasa-cased')\n", " Check available models at `malaya.generator.isi_penting.available_huggingface()`.\n", "\n", " Returns\n", " -------\n", " result: malaya.torch_model.huggingface.IsiPentingGenerator\n", " \"\"\"\n", "```" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "6cd9efd4b7d64d23b71703d2f3a89fe2", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Downloading: 0%| | 0.00/822 [00:00