{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# MS to EN HuggingFace" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "This tutorial is available as an IPython notebook at [Malaya/example/ms-en-translation-huggingface](https://github.com/huseinzol05/Malaya/tree/master/example/ms-en-translation-huggingface).\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "This module trained on standard language and augmented local language structures, proceed with caution.\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "os.environ['CUDA_VISIBLE_DEVICES'] = ''" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 3.38 s, sys: 3.55 s, total: 6.93 s\n", "Wall time: 2.23 s\n" ] } ], "source": [ "%%time\n", "\n", "import malaya\n", "import logging\n", "\n", "logging.basicConfig(level=logging.INFO)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### List available HuggingFace models" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:malaya.translation.ms_en:tested on FLORES200 EN-MS (eng_Latn-zsm_Latn) pair `dev` set, https://github.com/facebookresearch/flores/tree/main/flores200\n", "INFO:malaya.translation.ms_en:for noisy, tested on noisy twitter google translation, https://huggingface.co/datasets/mesolitica/augmentation-test-set\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Size (MB)BLEUSacreBLEU VerboseSacreBLEU-chrF++-FLORES200Suggested length
mesolitica/finetune-translation-t5-super-super-tiny-standard-bahasa-cased23.330.21614464.9/38.1/24.1/15.3 (BP = 0.978 ratio = 0.978 ...56.46256
mesolitica/finetune-translation-t5-super-tiny-standard-bahasa-cased50.734.10561567.3/41.6/27.8/18.7 (BP = 0.982 ratio = 0.982 ...59.18256
mesolitica/finetune-translation-t5-tiny-standard-bahasa-cased13937.26048568.3/44.1/30.5/21.4 (BP = 0.995 ratio = 0.995 ...61.29256
mesolitica/finetune-translation-t5-small-standard-bahasa-cased24242.01021871.7/49.0/35.6/26.1 (BP = 0.989 ratio = 0.989 ...64.67256
mesolitica/finetune-translation-t5-base-standard-bahasa-cased89243.40885372.3/50.5/37.1/27.7 (BP = 0.987 ratio = 0.987 ...65.44256
mesolitica/finetune-noisy-translation-t5-tiny-bahasa-cased13939.72513469.8/46.2/32.8/23.6 (BP = 0.999 ratio = 0.999 ...None256
mesolitica/finetune-noisy-translation-t5-small-bahasa-cased24241.83407171.7/48.7/35.4/26.0 (BP = 0.989 ratio = 0.989 ...None256
mesolitica/finetune-noisy-translation-t5-base-bahasa-cased89243.43272371.8/49.8/36.6/27.2 (BP = 1.000 ratio = 1.000 ...None256
mesolitica/finetune-noisy-translation-t5-tiny-bahasa-cased-v213960.00096777.9/63.9/54.6/47.7 (BP = 1.000 ratio = 1.036 ...None256
mesolitica/finetune-noisy-translation-t5-small-bahasa-cased-v424264.06258280.1/67.7/59.1/52.5 (BP = 1.000 ratio = 1.042 ...None256
mesolitica/finetune-noisy-translation-t5-base-bahasa-cased-v289264.58381980.2/68.1/59.8/53.2 (BP = 1.000 ratio = 1.048 ...None256
\n", "
" ], "text/plain": [ " Size (MB) BLEU \\\n", "mesolitica/finetune-translation-t5-super-super-... 23.3 30.216144 \n", "mesolitica/finetune-translation-t5-super-tiny-s... 50.7 34.105615 \n", "mesolitica/finetune-translation-t5-tiny-standar... 139 37.260485 \n", "mesolitica/finetune-translation-t5-small-standa... 242 42.010218 \n", "mesolitica/finetune-translation-t5-base-standar... 892 43.408853 \n", "mesolitica/finetune-noisy-translation-t5-tiny-b... 139 39.725134 \n", "mesolitica/finetune-noisy-translation-t5-small-... 242 41.834071 \n", "mesolitica/finetune-noisy-translation-t5-base-b... 892 43.432723 \n", "mesolitica/finetune-noisy-translation-t5-tiny-b... 139 60.000967 \n", "mesolitica/finetune-noisy-translation-t5-small-... 242 64.062582 \n", "mesolitica/finetune-noisy-translation-t5-base-b... 892 64.583819 \n", "\n", " SacreBLEU Verbose \\\n", "mesolitica/finetune-translation-t5-super-super-... 64.9/38.1/24.1/15.3 (BP = 0.978 ratio = 0.978 ... \n", "mesolitica/finetune-translation-t5-super-tiny-s... 67.3/41.6/27.8/18.7 (BP = 0.982 ratio = 0.982 ... \n", "mesolitica/finetune-translation-t5-tiny-standar... 68.3/44.1/30.5/21.4 (BP = 0.995 ratio = 0.995 ... \n", "mesolitica/finetune-translation-t5-small-standa... 71.7/49.0/35.6/26.1 (BP = 0.989 ratio = 0.989 ... \n", "mesolitica/finetune-translation-t5-base-standar... 72.3/50.5/37.1/27.7 (BP = 0.987 ratio = 0.987 ... \n", "mesolitica/finetune-noisy-translation-t5-tiny-b... 69.8/46.2/32.8/23.6 (BP = 0.999 ratio = 0.999 ... \n", "mesolitica/finetune-noisy-translation-t5-small-... 71.7/48.7/35.4/26.0 (BP = 0.989 ratio = 0.989 ... \n", "mesolitica/finetune-noisy-translation-t5-base-b... 71.8/49.8/36.6/27.2 (BP = 1.000 ratio = 1.000 ... \n", "mesolitica/finetune-noisy-translation-t5-tiny-b... 77.9/63.9/54.6/47.7 (BP = 1.000 ratio = 1.036 ... \n", "mesolitica/finetune-noisy-translation-t5-small-... 80.1/67.7/59.1/52.5 (BP = 1.000 ratio = 1.042 ... \n", "mesolitica/finetune-noisy-translation-t5-base-b... 80.2/68.1/59.8/53.2 (BP = 1.000 ratio = 1.048 ... \n", "\n", " SacreBLEU-chrF++-FLORES200 \\\n", "mesolitica/finetune-translation-t5-super-super-... 56.46 \n", "mesolitica/finetune-translation-t5-super-tiny-s... 59.18 \n", "mesolitica/finetune-translation-t5-tiny-standar... 61.29 \n", "mesolitica/finetune-translation-t5-small-standa... 64.67 \n", "mesolitica/finetune-translation-t5-base-standar... 65.44 \n", "mesolitica/finetune-noisy-translation-t5-tiny-b... None \n", "mesolitica/finetune-noisy-translation-t5-small-... None \n", "mesolitica/finetune-noisy-translation-t5-base-b... None \n", "mesolitica/finetune-noisy-translation-t5-tiny-b... None \n", "mesolitica/finetune-noisy-translation-t5-small-... None \n", "mesolitica/finetune-noisy-translation-t5-base-b... None \n", "\n", " Suggested length \n", "mesolitica/finetune-translation-t5-super-super-... 256 \n", "mesolitica/finetune-translation-t5-super-tiny-s... 256 \n", "mesolitica/finetune-translation-t5-tiny-standar... 256 \n", "mesolitica/finetune-translation-t5-small-standa... 256 \n", "mesolitica/finetune-translation-t5-base-standar... 256 \n", "mesolitica/finetune-noisy-translation-t5-tiny-b... 256 \n", "mesolitica/finetune-noisy-translation-t5-small-... 256 \n", "mesolitica/finetune-noisy-translation-t5-base-b... 256 \n", "mesolitica/finetune-noisy-translation-t5-tiny-b... 256 \n", "mesolitica/finetune-noisy-translation-t5-small-... 256 \n", "mesolitica/finetune-noisy-translation-t5-base-b... 256 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "malaya.translation.ms_en.available_huggingface()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These huggingface models trained on:\n", " \n", "1. EN-MS dataset, https://huggingface.co/datasets/mesolitica/en-ms\n", "2. MS-EN dataset, https://huggingface.co/datasets/mesolitica/ms-en\n", "3. NLLB eng_Latn-zsm_Latn, https://github.com/huseinzol05/malay-dataset/tree/master/translation/laser" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load Transformer models\n", "\n", "```python\n", "def huggingface(model: str = 'mesolitica/finetune-translation-t5-small-standard-bahasa-cased', **kwargs):\n", " \"\"\"\n", " Load HuggingFace model to translate MS-to-EN.\n", "\n", " Parameters\n", " ----------\n", " model: str, optional (default='mesolitica/finetune-translation-t5-small-standard-bahasa-cased')\n", " Check available models at `malaya.translation.ms_en.available_huggingface()`.\n", "\n", " Returns\n", " -------\n", " result: malaya.torch_model.huggingface.Generator\n", " \"\"\"\n", "```" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-10-04 21:25:52.819612: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA\n", "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", "INFO:malaya_boilerplate.frozen_graph:running home/husein/.cache/huggingface/hub/models--huseinzol05--translation-ms-en-base/snapshots/c163027ea2df8ba8364b601396fa89fcf263ece5 using device /device:CPU:0\n", "2022-10-04 21:25:52.825155: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected\n", "2022-10-04 21:25:52.825193: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: husein-MS-7D31\n", "2022-10-04 21:25:52.825200: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: husein-MS-7D31\n", "2022-10-04 21:25:52.825279: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program\n", "2022-10-04 21:25:52.825311: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 470.141.3\n" ] } ], "source": [ "transformer = malaya.translation.ms_en.transformer()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "transformer_huggingface = malaya.translation.ms_en.huggingface()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Translate\n", "\n", "```python\n", "def generate(self, strings: List[str], **kwargs):\n", " \"\"\"\n", " Generate texts from the input.\n", "\n", " Parameters\n", " ----------\n", " strings : List[str]\n", " **kwargs: vector arguments pass to huggingface `generate` method.\n", " Read more at https://huggingface.co/docs/transformers/main_classes/text_generation\n", "\n", " Returns\n", " -------\n", " result: List[str]\n", " \"\"\"\n", "```\n", "\n", "**For better results, always split by end of sentences**." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "from pprint import pprint" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('TANGKAK - Tan Sri Muhyiddin Yassin berkata, beliau tidak mahu menyentuh '\n", " 'mengenai isu politik buat masa ini, sebaliknya mahu menumpukan kepada soal '\n", " 'kebajikan rakyat serta usaha merancakkan semula ekonomi negara yang terjejas '\n", " 'berikutan pandemik Covid-19. Perdana Menteri menjelaskan perkara itu ketika '\n", " 'berucap pada Majlis Bertemu Pemimpin bersama pemimpin masyarakat Dewan '\n", " 'Undangan Negeri (DUN) Gambir di Dewan Serbaguna Bukit Gambir hari ini.')\n" ] } ], "source": [ "# https://www.sinarharian.com.my/article/89678/BERITA/Politik/Saya-tidak-mahu-sentuh-isu-politik-Muhyiddin\n", "\n", "string_news1 = 'TANGKAK - Tan Sri Muhyiddin Yassin berkata, beliau tidak mahu menyentuh mengenai isu politik buat masa ini, sebaliknya mahu menumpukan kepada soal kebajikan rakyat serta usaha merancakkan semula ekonomi negara yang terjejas berikutan pandemik Covid-19. Perdana Menteri menjelaskan perkara itu ketika berucap pada Majlis Bertemu Pemimpin bersama pemimpin masyarakat Dewan Undangan Negeri (DUN) Gambir di Dewan Serbaguna Bukit Gambir hari ini.'\n", "pprint(string_news1)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('ALOR SETAR - Kemelut politik Pakatan Harapan (PH) belum berkesudahan apabila '\n", " 'masih gagal memuktamadkan calon Perdana Menteri yang dipersetujui bersama. '\n", " 'Ahli Parlimen Sik, Ahmad Tarmizi Sulaiman berkata, sehubungan itu pihaknya '\n", " 'mencadangkan mantan Pengerusi Parti Pribumi Bersatu Malaysia (Bersatu), Tun '\n", " 'Dr Mahathir Mohamad dan Presiden Parti Keadilan Rakyat (PKR), Datuk Seri '\n", " 'Anwar Ibrahim mengundurkan diri daripada politik sebagai jalan penyelesaian.')\n" ] } ], "source": [ "# https://www.sinarharian.com.my/article/90021/BERITA/Politik/Tun-Mahathir-Anwar-disaran-bersara-untuk-selesai-kemelut-politik\n", "\n", "string_news2 = 'ALOR SETAR - Kemelut politik Pakatan Harapan (PH) belum berkesudahan apabila masih gagal memuktamadkan calon Perdana Menteri yang dipersetujui bersama. Ahli Parlimen Sik, Ahmad Tarmizi Sulaiman berkata, sehubungan itu pihaknya mencadangkan mantan Pengerusi Parti Pribumi Bersatu Malaysia (Bersatu), Tun Dr Mahathir Mohamad dan Presiden Parti Keadilan Rakyat (PKR), Datuk Seri Anwar Ibrahim mengundurkan diri daripada politik sebagai jalan penyelesaian.'\n", "pprint(string_news2)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('Menteri Kanan (Kluster Keselamatan) Datuk Seri Ismail Sabri Yaakob berkata, '\n", " 'kelonggaran itu diberi berikutan kerajaan menyedari masalah yang dihadapi '\n", " 'mereka untuk memperbaharui dokumen itu. Katanya, selain itu, bagi rakyat '\n", " 'asing yang pas lawatan sosial tamat semasa Perintah Kawalan Pergerakan (PKP) '\n", " 'pula boleh ke pejabat Jabatan Imigresen yang terdekat untuk mendapatkan '\n", " 'lanjutan tempoh.')\n" ] } ], "source": [ "string_news3 = 'Menteri Kanan (Kluster Keselamatan) Datuk Seri Ismail Sabri Yaakob berkata, kelonggaran itu diberi berikutan kerajaan menyedari masalah yang dihadapi mereka untuk memperbaharui dokumen itu. Katanya, selain itu, bagi rakyat asing yang pas lawatan sosial tamat semasa Perintah Kawalan Pergerakan (PKP) pula boleh ke pejabat Jabatan Imigresen yang terdekat untuk mendapatkan lanjutan tempoh.'\n", "pprint(string_news3)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('Selain itu, pameran kerjaya membantu para pelajar menentukan kerjaya yang '\n", " 'akan diceburi oleh mereka. Seperti yang kita ketahui, pasaran kerjaya di '\n", " 'Malaysia sangat luas dan masih banyak sektor pekerjaan di negara ini yang '\n", " 'masih kosong kerana sukar untuk mencari tenaga kerja yang benar-benar '\n", " 'berkelayakan. Sebagai contohnya, sektor perubatan di Malaysia menghadapi '\n", " 'masalah kekurangan tenaga kerja yang kritikal, khususnya tenaga pakar '\n", " 'disebabkan peletakan jawatan oleh doktor dan pakar perubatan untuk memasuki '\n", " 'sektor swasta serta berkembangnya perkhidmatan kesihatan dan perubatan. '\n", " 'Setelah menyedari hakikat ini, para pelajar akan lebih berminat untuk '\n", " 'menceburi bidang perubatan kerana pameran kerjaya yang dilaksanakan amat '\n", " 'membantu memberikan pengetahuan am tentang kerjaya ini')\n" ] } ], "source": [ "# https://qcikgubm.blogspot.com/2018/02/contoh-soalan-dan-jawapan-karangan.html\n", "\n", "string_karangan = 'Selain itu, pameran kerjaya membantu para pelajar menentukan kerjaya yang akan diceburi oleh mereka. Seperti yang kita ketahui, pasaran kerjaya di Malaysia sangat luas dan masih banyak sektor pekerjaan di negara ini yang masih kosong kerana sukar untuk mencari tenaga kerja yang benar-benar berkelayakan. Sebagai contohnya, sektor perubatan di Malaysia menghadapi masalah kekurangan tenaga kerja yang kritikal, khususnya tenaga pakar disebabkan peletakan jawatan oleh doktor dan pakar perubatan untuk memasuki sektor swasta serta berkembangnya perkhidmatan kesihatan dan perubatan. Setelah menyedari hakikat ini, para pelajar akan lebih berminat untuk menceburi bidang perubatan kerana pameran kerjaya yang dilaksanakan amat membantu memberikan pengetahuan am tentang kerjaya ini'\n", "pprint(string_karangan)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['TANGKAK - Tan Sri Muhyiddin Yassin said he did not want to touch on '\n", " 'political issues at the moment, instead focusing on the welfare of the '\n", " \"people and efforts to revitalize the affected country's economy following \"\n", " 'the Covid-19 pandemic. The prime minister explained the matter when speaking '\n", " 'at a Leadership Meeting with Gambir State Assembly (DUN) leaders at the '\n", " 'Bukit Gambir Multipurpose Hall today.',\n", " 'ALOR SETAR - Pakatan Harapan (PH) political turmoil has not ended when it '\n", " \"has failed to finalize the Prime Minister's candidate agreed upon. Sik MP \"\n", " 'Ahmad Tarmizi Sulaiman said he had suggested former United Nations (UN) '\n", " \"Indigenous Party chairman Tun Dr Mahathir Mohamad and People's Justice Party \"\n", " '(PKR) president Datuk Seri Anwar Ibrahim resign from politics as a solution.',\n", " 'Senior Minister (Security Cluster) Datuk Seri Ismail Sabri Yaakob said the '\n", " 'relaxation was given as the government was aware of the problems they had to '\n", " 'renew the document. He added that for foreigners who had passed the social '\n", " 'visit during the Movement Control Order (CPP) they could go to the nearest '\n", " 'Immigration Department office for further extension.',\n", " 'In addition, career exhibitions help students determine their careers. As we '\n", " 'know, the career market in Malaysia is very broad and there are still many '\n", " 'job sectors in the country that are still vacant because it is difficult to '\n", " 'find a truly qualified workforce. For example, the medical sector in '\n", " 'Malaysia is facing a critical shortage of labor, especially specialists due '\n", " 'to the resignation of doctors and physicians to enter the private sector and '\n", " 'develop health and medical services. Upon realizing this fact, students will '\n", " 'be more interested in medicine because the exhibition careers are very '\n", " 'helpful in providing general knowledge of this career.']\n", "CPU times: user 15.7 s, sys: 2.18 s, total: 17.9 s\n", "Wall time: 6.89 s\n" ] } ], "source": [ "%%time\n", "\n", "pprint(transformer.greedy_decoder([string_news1, string_news2, string_news3, string_karangan]))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['TANGKAK - Tan Sri Muhyiddin Yassin said he did not want to touch on '\n", " 'political issues at this time, instead focusing on the welfare of the people '\n", " \"and efforts to revive the country's economy affected by the Covid-19 \"\n", " 'pandemic. The prime minister explained this when speaking at a meeting of '\n", " 'leaders with Gambir State Assembly (DUN) leaders at the Bukit Gambir '\n", " 'Multipurpose Hall today.',\n", " 'ALOR SETAR - Pakatan Harapan (PH) political turmoil has not ended when it '\n", " \"has failed to finalise the Prime Minister's candidate. Sik Member of \"\n", " 'Parliament Ahmad Tarmizi Sulaiman said he had suggested that former United '\n", " \"People's Party (UN) chairman Tun Dr Mahathir Mohamad and People's Justice \"\n", " 'Party (PKR) president Datuk Seri Anwar Ibrahim resign from politics as a '\n", " 'solution.',\n", " 'Senior Minister (Safety cluster) Datuk Seri Ismail Sabri Yaakob said the '\n", " 'relaxation was given as the government was aware of the problems they faced '\n", " 'to renew the document. He said that foreigners who had expired social visit '\n", " 'during the Movement Control Order (MCO) could go to the nearest Immigration '\n", " 'Department office for an extension.',\n", " 'In addition, career exhibitions help students determine the careers they '\n", " 'will pursue. As we know, the career market in Malaysia is vast and many job '\n", " 'sectors in the country are still vacant because it is difficult to find a '\n", " 'truly qualified workforce. For example, the medical sector in Malaysia is '\n", " 'facing critical labor shortages, especially specialists due to the '\n", " 'resignation of doctors and physicians to enter the private sector as well as '\n", " 'the development of health and medical services. Having realized this fact, '\n", " 'students will be more interested in pursuing medicine because the career '\n", " 'exhibitions are being implemented greatly to provide general knowledge of '\n", " 'this career.']\n", "CPU times: user 16.7 s, sys: 96.4 ms, total: 16.8 s\n", "Wall time: 1.48 s\n" ] } ], "source": [ "%%time\n", "\n", "pprint(transformer_huggingface.generate([string_news1, string_news2, string_news3, string_karangan],\n", " max_length = 1000))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### compare with Google translate using googletrans\n", "\n", "Install it by,\n", "\n", "```bash\n", "pip3 install googletrans==4.0.0rc1\n", "```" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "from googletrans import Translator\n", "\n", "translator = Translator()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "strings = [string_news1, string_news2, string_news3, string_karangan]" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TANGKAK - Tan Sri Muhyiddin Yassin said he did not want to touch on political issues at this time, instead of focusing on the welfare of the people and efforts to regenerate the country's economy following the Covid -19 pandemic.The prime minister explained the matter when speaking at a ceremony with a leader of the Gambir State Assembly (DUN) community leader at the Bukit Gambir Multipurpose Hall today.\n", "ALOR SETAR - The Pakatan Harapan (PH) political turmoil has not ended when it fails to finalize the agreed prime ministerial candidate.Sik Member of Parliament Ahmad Tarmizi Sulaiman said he had suggested former United Indigenous Party (UN) chairman Tun Dr Mahathir Mohamad and the People's Justice Party (PKR) president Datuk Seri Anwar Ibrahim resigned from politics as a solution.\n", "Senior Minister (Security Cluster) Datuk Seri Ismail Sabri Yaakob said the relaxation was given as the government was aware of the problem they had to renew the document.He said, for foreigners, the social visit ended during the Movement Control Order (CPP) could go to the nearest Immigration Department's office for extension.\n", "In addition, career exhibitions help students determine the careers they will be involved in.As we know, the career market in Malaysia is very broad and there are still many employment sectors in the country that are still vacant because it is difficult to find a truly qualified workforce.For example, the medical sector in Malaysia is facing critical workforce problems, especially experts due to the resignation of doctors and physicians to enter the private sector as well as the growth of health and medical services.Upon realizing this fact, students will be more interested in getting into medicine because their career exhibitions are very helpful in providing general knowledge of this career\n" ] } ], "source": [ "for t in strings:\n", " r = translator.translate(t, src='ms', dest = 'en')\n", " print(r.text)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }