Contents

NSFW Detection

Contents

NSFW Detection#

This tutorial is available as an IPython notebook at Malaya/example/nsfw.

Pretty simple and straightforward, just to detect whether a text is NSFW or not.

[1]:

%%time
import malaya

CPU times: user 4.05 s, sys: 741 ms, total: 4.79 s
Wall time: 4.59 s

Get label#

[2]:

malaya.nsfw.label

[2]:

['sex', 'gambling', 'negative']

Load lexicon model#

Pretty naive but really effective, lexicon gathered at Malay-Dataset/corpus/nsfw.

def lexicon(**kwargs):
    """
    Load Lexicon NSFW model.

    Returns
    -------
    result : malaya.text.lexicon.nsfw.Lexicon class
    """

[3]:

lexicon_model = malaya.nsfw.lexicon()

[4]:

string1 = 'xxx sgt panas, best weh'
string2 = 'jmpa dekat kl sentral'
string3 = 'Rolet Dengan Wang Sebenar'

Predict batch of strings#

[5]:

lexicon_model.predict([string1, string2, string3])

[5]:

['sex', 'negative', 'gambling']

Load multinomial model#

All model interface will follow sklearn interface started v3.4,

def multinomial(**kwargs):
    """
    Load multinomial NSFW model.

    Returns
    -------
    result : malaya.model.ml.BAYES class
    """

[7]:

model = malaya.nsfw.multinomial()

Predict batch of strings#

[8]:

model.predict([string1, string2, string3])

[8]:

['sex', 'negative', 'gambling']

Predict batch of strings with probability#

[9]:

model.predict_proba([string1, string2, string3])

[9]:

[{'sex': 0.9357058034930408,
  'gambling': 0.02616353532998711,
  'negative': 0.03813066117697173},
 {'sex': 0.027541900360621846,
  'gambling': 0.03522626245360637,
  'negative': 0.9372318371857732},
 {'sex': 0.01865380888750343,
  'gambling': 0.9765340760395791,
  'negative': 0.004812115072918792}]