Kesalahan Tatabahasa

This tutorial is available as an IPython notebook at Malaya/example/tatabahasa.

This module only trained on standard language structure, so it is not save to use it for local language structure.

[1]:
import malaya
from pprint import pprint
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow_addons/utils/ensure_tf_install.py:67: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.3.0 and strictly below 2.5.0 (nightly versions are not supported).
 The versions of TensorFlow you are currently using is 2.5.0 and is not supported.
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version.
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
  UserWarning,
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow_addons/utils/resource_loader.py:103: UserWarning: You are currently using TensorFlow 2.5.0 and trying to load a custom op (custom_ops/seq2seq/_beam_search_ops.so).
TensorFlow Addons has compiled its custom ops against TensorFlow 2.4.0, and there are no compatibility guarantees between the two versions.
This means that you might get segfaults when loading the custom op, or other kind of low-level errors.
 If you do, do not file an issue on Github. This is a known limitation.

It might help you to fallback to pure Python ops with TF_ADDONS_PY_OPS . To do that, see https://github.com/tensorflow/addons#gpucpu-custom-ops

You can also change the TensorFlow version installed on your system. You would need a TensorFlow version equal to or above 2.4.0 and strictly below 2.5.0.
 Note that nightly versions of TensorFlow, as well as non-pip TensorFlow like `conda install tensorflow` or compiled from source are not supported.

The last solution is to find the TensorFlow Addons version that has custom ops compatible with the TensorFlow installed on your system. To do that, refer to the readme: https://github.com/tensorflow/addons
  UserWarning,
/Users/huseinzolkepli/Documents/malaya-boilerplate/malaya_boilerplate/frozen_graph.py:28: UserWarning: Cannot import beam_search_ops from Tensorflow Addons, `deep_model` for stemmer will not available to use, make sure Tensorflow Addons version >= 0.12.0
  'Cannot import beam_search_ops from Tensorflow Addons, `deep_model` for stemmer will not available to use, make sure Tensorflow Addons version >= 0.12.0'

List available Transformer models

[2]:
malaya.tatabahasa.available_transformer()
INFO:root:tested on 7.5k kesalahan tatabahasa texts.
[2]:
Size (MB) Quantized Size (MB) WER
t5 1250.0 481.00 0.017890
small-t5 355.6 195.00 0.018797
tiny-t5 208.0 103.00 0.032804
super-tiny-t5 81.8 27.10 0.035114
3x-super-tiny-t5 18.3 4.46 0.036762

Load Transformer model

def transformer(model: str = 'small-t5', quantized: bool = False, **kwargs):
    """
    Load Malaya transformer encoder-decoder model to correct a `kesalahan tatabahasa` text.

    Parameters
    ----------
    model : str, optional (default='small-t5')
        Model architecture supported. Allowed values:

        * ``'t5'`` - T5 BASE parameters.
        * ``'small-t5'`` - T5 SMALL parameters.
        * ``'tiny-t5'`` - T5 TINY parameters.
        * ``'super-tiny-t5'`` - T5 SUPER TINY parameters.
        * ``'3x-super-tiny-t5'`` - T5 3X SUPER TINY parameters.

    quantized : bool, optional (default=False)
        if True, will load 8-bit quantized model.
        Quantized model not necessary faster, totally depends on the machine.

    Returns
    -------
    result: model
        List of model classes:

        * if `t5` in model, will return `malaya.model.t5.Tatabahasa`.
    """
[3]:
model = malaya.tatabahasa.transformer(model = 'small-t5')
INFO:root:running kesalahan-tatabahasa/small-t5 using device /device:CPU:0

Predict using greedy decoder

def greedy_decoder(self, strings: List[str]):
    """
    fix kesalahan tatabahasa.

    Parameters
    ----------
    strings: List[str]

    Returns
    -------
    result: List[str]
    """
[4]:
# https://ms.wikipedia.org/wiki/Bola_sepak
string = 'Pada amnya, hanya penjaga gol sahaja yang dibenarkan menyentuh bola dengan tangan di dalam kawasan golnya'
[5]:
model.greedy_decoder([string])
[5]:
['Pada amnya , hanya penjaga gol sahaja yang dibenarkan menyentuh bola dengan tangan di dalam kawasan golnya']

Now assumed we have kesalahan frasa nama, from penjaga gol become gol penjaga.

[6]:
# https://ms.wikipedia.org/wiki/Bola_sepak
string = 'Pada amnya, hanya gol penjaga sahaja yang dibenarkan menyentuh bola dengan tangan di dalam kawasan golnya'
[7]:
model.greedy_decoder([string])
[7]:
['Pada amnya , hanya penjaga gol sahaja yang dibenarkan menyentuh bola dengan tangan di dalam kawasan golnya']
[8]:
string = 'Sani mendapat markah yang tertinggi sekali.'
string1 = 'Hassan ialah peserta yang termuda sekali dalam pertandingan itu.'
model.greedy_decoder([string, string1])
[8]:
['Sani mendapat markah yang tertinggi .',
 'Hassan ialah peserta yang termuda dalam pertandingan itu .']
[19]:
string = 'Dia kata kepada saya.'
model.greedy_decoder([string])
[19]:
['Dia berkata kepada saya .']

More examples

I just copy pasted from https://ms.wikipedia.org/wiki/Kesalahan_biasa_tatabahasa_Melayu

[11]:
string = 'Tidak ada apa yang mereka risaukan waktu itu.'
model.greedy_decoder([string])
[11]:
['Tidak ada apa yang mereka risaukan waktu itu .']
[12]:
string = 'Ayahnya setuju walaupun melanggar syarat yang dia sendiri menetapkan.'
model.greedy_decoder([string])
[12]:
['Ayahnya setuju walaupun melanggar syarat yang dia sendiri menetapkan .']
[14]:
string = 'Semuanya dia kenal.'
model.greedy_decoder([string])
[14]:
['Semuanya dia kenal .']
[15]:
string = 'Dia menjawab seperti disuruh-suruh oleh kuasa yang dia tidak tahu dari mana puncanya.'
model.greedy_decoder([string])
[15]:
['Dia menjawab seperti disuruh-suruh oleh kuasa yang dia tidak tahu dari mana puncanya .']
[16]:
string = 'Bola ini ditendang oleh saya.'
model.greedy_decoder([string])
[16]:
['Bola ini ditendang oleh saya .']
[17]:
string = 'Makanan ini kamu telah makan?'
model.greedy_decoder([string])
[17]:
['Makanan ini kamu telah makan .']
[18]:
string = 'Segala perubahan yang berlaku kita akan menghadapi sama-sama.'
model.greedy_decoder([string])
[18]:
['Segala perubahan yang berlaku kita akan menghadapi sama-sama .']
[20]:
string = 'Kakak saya sedang memasak gulai nangka. Dia menyenduk seketul nangka gulai dan menyuruh saya merasanya.'
model.greedy_decoder([string])
[20]:
['Kakak saya sedang memasak gulai nangka . Dia menyenduk seketul gulai nangka dan menyuruh saya merasanya .']
[22]:
string = 'Sally sedang membaca bila saya tiba di rumahnya.'
model.greedy_decoder([string])
[22]:
['Sally sedang membaca bila dia tiba di rumahnya .']
[23]:
string = 'Badannya besar kecuali kakinya kecil.'
model.greedy_decoder([string])
[23]:
['Badannya besar dan kakinya kecil .']
[24]:
string = 'Beribu peniaga tidak membayar cukai pendapatan.'
model.greedy_decoder([string])
[24]:
['Beribu peniaga tidak membayar cukai pendapatan .']
[25]:
string = 'Setengah remaja suka membuang masa di pasar raya.'
model.greedy_decoder([string])
[25]:
['Setengah remaja suka membuang masa di pasar raya .']
[26]:
string = 'Umar telah berpindah daripada sekolah ini bulan lalu.'
model.greedy_decoder([string])
[26]:
['Umar telah berpindah ke sekolah ini bulan lalu .']
[28]:
string = 'Para-para peserta sedang berbaris.'
model.greedy_decoder([string])
[28]:
['Para peserta sedang berbaris .']