NSFW Detection#

This tutorial is available as an IPython notebook at Malaya/example/nsfw.

Pretty simple and straightforward, just to detect whether a text is NSFW or not.

[1]:
%%time
import malaya
CPU times: user 4.05 s, sys: 741 ms, total: 4.79 s
Wall time: 4.59 s

Get label#

[2]:
malaya.nsfw.label
[2]:
['sex', 'gambling', 'negative']

Load lexicon model#

Pretty naive but really effective, lexicon gathered at Malay-Dataset/corpus/nsfw.

def lexicon(**kwargs):
    """
    Load Lexicon NSFW model.

    Returns
    -------
    result : malaya.text.lexicon.nsfw.Lexicon class
    """
[3]:
lexicon_model = malaya.nsfw.lexicon()
[4]:
string1 = 'xxx sgt panas, best weh'
string2 = 'jmpa dekat kl sentral'
string3 = 'Rolet Dengan Wang Sebenar'

Predict batch of strings#

[5]:
lexicon_model.predict([string1, string2, string3])
[5]:
['sex', 'negative', 'gambling']

Load multinomial model#

All model interface will follow sklearn interface started v3.4,

def multinomial(**kwargs):
    """
    Load multinomial NSFW model.

    Returns
    -------
    result : malaya.model.ml.BAYES class
    """
[7]:
model = malaya.nsfw.multinomial()

Predict batch of strings#

[8]:
model.predict([string1, string2, string3])
[8]:
['sex', 'negative', 'gambling']

Predict batch of strings with probability#

[9]:
model.predict_proba([string1, string2, string3])
[9]:
[{'sex': 0.9357058034930408,
  'gambling': 0.02616353532998711,
  'negative': 0.03813066117697173},
 {'sex': 0.027541900360621846,
  'gambling': 0.03522626245360637,
  'negative': 0.9372318371857732},
 {'sex': 0.01865380888750343,
  'gambling': 0.9765340760395791,
  'negative': 0.004812115072918792}]