Jawi-to-Rumi

This tutorial is available as an IPython notebook at Malaya/example/jawi-rumi.

This module trained on both standard and local (included social media) language structures, so it is save to use for both.

Explanation

Originally from https://www.ejawi.net/converterV2.php?go=rumi able to convert Rumi to Jawi using heuristic method. So Malaya convert from heuristic and map it using deep learning model by inverse the dataset.

چوميل -> comel

[1]:
%%time
import malaya
CPU times: user 5.95 s, sys: 1.15 s, total: 7.1 s
Wall time: 9.05 s

Use deep learning model

Load LSTM + Bahdanau Attention Jawi to Rumi model.

If you are using Tensorflow 2, make sure Tensorflow Addons already installed,

pip install tensorflow-addons U
def deep_model(quantized: bool = False, **kwargs):
    """
    Load LSTM + Bahdanau Attention Rumi to Jawi model.
    Original size 11MB, quantized size 2.92MB .
    CER on test set: 0.09239719040982326
    WER on test set: 0.33811816744187656

    Parameters
    ----------
    quantized : bool, optional (default=False)
        if True, will load 8-bit quantized model.
        Quantized model not necessary faster, totally depends on the machine.

    Returns
    -------
    result: malaya.model.tf.Seq2SeqLSTM class
    """
[2]:
model = malaya.jawi_rumi.deep_model()

Load Quantized model

To load 8-bit quantized model, simply pass quantized = True, default is False.

We can expect slightly accuracy drop from quantized model, and not necessary faster than normal 32-bit float model, totally depends on machine.

[3]:
quantized_model = malaya.jawi_rumi.deep_model(quantized = True)
Load quantized model will cause accuracy drop.

Predict

def predict(self, strings: List[str], beam_search: bool = False):
    """
    Convert to target string.

    Parameters
    ----------
    strings : List[str]
    beam_search : bool, (optional=False)
        If True, use beam search decoder, else use greedy decoder.

    Returns
    -------
    result: List[str]
    """

If want to speed up the inference, set beam_search = False.

[4]:
model.predict(['ساي سوك ماكن ايم', 'اياق اچق كوتوق', 'ايسوق بيرثداي ساي، جڬن لوڤا باوق هديه'])
[4]:
['saya suka makan im',
 'eak ack kotok',
 'aisuk berthday saya, jegan lupa bawak hadiah']
[5]:
quantized_model.predict(['ساي سوك ماكن ايم', 'اياق اچق كوتوق', 'ايسوق بيرثداي ساي، جڬن لوڤا باوق هديه'])
[5]:
['saya suka makan im',
 'eak ack kotok',
 'aisuk berthday saya, jegan lopa bawak hadiah']