MS to IND HuggingFace
Contents
MS to IND HuggingFace#
This tutorial is available as an IPython notebook at Malaya/example/ms-ind-translation-huggingface.
This module trained on standard language and augmented local language structures, proceed with caution.
[1]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = ''
[2]:
%%time
import malaya
import logging
logging.basicConfig(level=logging.INFO)
CPU times: user 3.81 s, sys: 3.51 s, total: 7.32 s
Wall time: 3.19 s
List available HuggingFace models#
[3]:
malaya.translation.ms_ind.available_huggingface()
INFO:malaya.translation.ms_ind:tested on FLORES200 MS-IND (zsm_Latn-ind_Latn) pair `dev` set, https://github.com/facebookresearch/flores/tree/main/flores200
[3]:
Size (MB) | BLEU | SacreBLEU Verbose | SacreBLEU-chrF++-FLORES200 | Suggested length | |
---|---|---|---|---|---|
mesolitica/finetune-translation-austronesian-t5-tiny-standard-bahasa-cased | 139 | 33.882077 | 67.7/42.2/28.0/18.9 (BP = 0.966 ratio = 0.966 ... | 59.46 | 512 |
mesolitica/finetune-translation-austronesian-t5-small-standard-bahasa-cased | 242 | 35.954481 | 66.3/42.6/29.1/20.3 (BP = 1.000 ratio = 1.014 ... | 61.02 | 512 |
mesolitica/finetune-translation-austronesian-t5-base-standard-bahasa-cased | 892 | 37.62068 | 70.0/45.8/31.7/22.5 (BP = 0.967 ratio = 0.968 ... | 62.1 | 512 |
Load Transformer models#
def huggingface(
model: str = 'mesolitica/finetune-translation-austronesian-t5-small-standard-bahasa-cased',
force_check: bool = True,
**kwargs,
):
"""
Load HuggingFace model to translate MS-to-IND.
Parameters
----------
model: str, optional (default='mesolitica/finetune-translation-t5-small-standard-bahasa-cased')
Check available models at `malaya.translation.ms_ind.available_huggingface()`.
Returns
-------
result: malaya.torch_model.huggingface.Generator
"""
[4]:
transformer_huggingface = malaya.translation.ms_ind.huggingface()
Translate#
def generate(self, strings: List[str], **kwargs):
"""
Generate texts from the input.
Parameters
----------
strings : List[str]
**kwargs: vector arguments pass to huggingface `generate` method.
Read more at https://huggingface.co/docs/transformers/main_classes/text_generation
Returns
-------
result: List[str]
"""
For better results, always split by end of sentences.
[5]:
from pprint import pprint
[8]:
news = "SHAH ALAM - Kerajaan harus bersikap 'kejam' terhadap desakan segelintir rakyat berkenaan isu pengeluaran Kumpulan Wang Simpanan Pekerja (KWSP) demi kesejahteraan rakyat di masa akan datang."
[9]:
%%time
pprint(transformer_huggingface.generate([news],
max_length = 1000))
["SHAH ALAM - Pemerintah harus 'kejam' dengan desakan segelintir masyarakat "
'terkait isu penarikan Dana Penyediaan Pegawai (EPF) untuk kesejahteraan '
'masyarakat di masa depan.']
CPU times: user 3.53 s, sys: 0 ns, total: 3.53 s
Wall time: 298 ms
compare with Google translate using googletrans#
Install it by,
pip3 install googletrans==4.0.0rc1
[10]:
from googletrans import Translator
translator = Translator()
[11]:
strings = [news]
[12]:
for t in strings:
r = translator.translate(t, src='ms', dest = 'id')
print(r.text)
Shah Alam - Pemerintah harus 'kejam' terhadap desakan beberapa orang tentang masalah produksi Dana Penyedia Karyawan (EPF) untuk masa depan rakyat.