Stemmer

Note

This tutorial is available as an IPython notebook here.

%%time
import malaya
CPU times: user 4.85 s, sys: 1.29 s, total: 6.14 s
Wall time: 7.76 s

Use deep learning model

model = malaya.stem.deep_model()
WARNING:tensorflow:From /Users/huseinzolkepli/Documents/Malaya/malaya/function/__init__.py:54: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:From /Users/huseinzolkepli/Documents/Malaya/malaya/function/__init__.py:55: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.

WARNING:tensorflow:From /Users/huseinzolkepli/Documents/Malaya/malaya/function/__init__.py:49: The name tf.InteractiveSession is deprecated. Please use tf.compat.v1.InteractiveSession instead.
string = 'Benda yg SALAH ni, jgn lah didebatkan. Yg SALAH xkan jadi betul. Ingat tu. Mcm mana kesat sekalipun org sampaikan mesej, dan memang benda tu salah, diam je. Xyah nk tunjuk kau open sangat nk tegur cara org lain berdakwah'
another_string = 'melayu bodoh, dah la gay, sokong lgbt lagi, memang tak guna, http://twitter.com'
model.stem(string)
'Benda yg SALAH ni , jgn lah debat . Yg SALAH xkan jadi betul . Ingat tu . Mcm mana kesat sekalipun org sampai mesej , dan memang benda tu salah , diam je . Xyah nk tunjuk kau open sangat nk tegur cara org lain dakwah'
model.stem(another_string)
'layu bodoh , dah la gay , sokong lgbt lagi , memang tak guna , http://twitter.com'
model.stem('saya menyerukanlah')
'saya seru'

Use Sastrawi stemmer

Malaya also included interface for Sastrawi stemmer. We also use it for internal purpose. To use it, simply,

malaya.stem.sastrawi(str)

But it not able to maintain words like url, hashtag, money, datetime and user mention.

malaya.stem.sastrawi(another_string)
'melayu bodoh dah la gay sokong lgbt lagi memang tak guna http twitter com'
malaya.stem.sastrawi('saya menyerukanlah')
'saya seru'
malaya.stem.sastrawi('menarik')
'tarik'