Emotion Analysis¶
This tutorial is available as an IPython notebook at Malaya/example/emotion.
This module trained on both standard and local (included social media) language structures, so it is save to use for both.
[1]:
%%time
import malaya
CPU times: user 4.85 s, sys: 677 ms, total: 5.53 s
Wall time: 4.67 s
[2]:
anger_text = 'babi la company ni, aku dah la penat datang dari jauh'
fear_text = 'takut doh tengok cerita hantu tadi'
happy_text = 'bestnya dapat tidur harini, tak payah pergi kerja'
love_text = 'aku sayang sgt dia dah doh'
sadness_text = 'kecewa tengok kerajaan baru ni, janji ape pun tak dapat'
surprise_text = 'sakit jantung aku, terkejut dengan cerita hantu tadi'
Get label¶
[3]:
malaya.emotion.label
[3]:
['anger', 'fear', 'happy', 'love', 'sadness', 'surprise']
All models follow same method as sklearn interface, predict
to get batch of labels, predict_proba
to get batch of probabilities.
Load multinomial model¶
All model interface will follow sklearn interface started v3.4,
model.predict(List[str])
model.predict_proba(List[str])
[5]:
model = malaya.emotion.multinomial()
[6]:
model.predict([anger_text])
[6]:
['anger']
[8]:
model.predict(
[anger_text, fear_text, happy_text, love_text, sadness_text, surprise_text]
)
[8]:
['anger', 'fear', 'happy', 'love', 'sadness', 'surprise']
[9]:
model.predict_proba(
[anger_text, fear_text, happy_text, love_text, sadness_text, surprise_text]
)
[9]:
[{'anger': 0.32948272681734814,
'fear': 0.13959708810717708,
'happy': 0.14671455153216045,
'love': 0.12489192355631354,
'sadness': 0.1285972541671178,
'surprise': 0.13071645581988448},
{'anger': 0.11379406005377896,
'fear': 0.4006934391283133,
'happy': 0.11389665647702245,
'love': 0.12481915233837086,
'sadness': 0.0991261507380643,
'surprise': 0.14767054126445014},
{'anger': 0.15051890586527464,
'fear': 0.13931406415515296,
'happy': 0.32037710031973415,
'love': 0.13747954667255546,
'sadness': 0.11565866743099411,
'surprise': 0.13665171555628927},
{'anger': 0.1590563839629243,
'fear': 0.14687344690114268,
'happy': 0.1419948160674701,
'love': 0.279550441361504,
'sadness': 0.1285927908584157,
'surprise': 0.14393212084854254},
{'anger': 0.14268176425895224,
'fear': 0.12178299725318226,
'happy': 0.16187751258299898,
'love': 0.1030494733572262,
'sadness': 0.34277869755707796,
'surprise': 0.1278295549905621},
{'anger': 0.06724850384395685,
'fear': 0.1283628050361525,
'happy': 0.05801958643852813,
'love': 0.06666524240157067,
'sadness': 0.06537667186293224,
'surprise': 0.6143271904168589}]
List available Transformer models¶
[3]:
malaya.emotion.available_transformer()
INFO:root:tested on 20% test set.
[3]:
Size (MB) | Quantized Size (MB) | Accuracy | |
---|---|---|---|
bert | 425.6 | 111.00 | 0.992 |
tiny-bert | 57.4 | 15.40 | 0.988 |
albert | 48.6 | 12.80 | 0.997 |
tiny-albert | 22.4 | 5.98 | 0.981 |
xlnet | 446.5 | 118.00 | 0.990 |
alxlnet | 46.8 | 13.30 | 0.989 |
Make sure you can check accuracy chart from here first before select a model, https://malaya.readthedocs.io/en/latest/Accuracy.html#emotion-analysis
You might want to use Tiny-Albert, a very small size, 22.4MB, but the accuracy is still on the top notch.
Load Albert model¶
All model interface will follow sklearn interface started v3.4,
model.predict(List[str])
model.predict_proba(List[str])
[4]:
model = malaya.emotion.transformer(model = 'albert')
INFO:tensorflow:loading sentence piece model
Load Quantized model¶
To load 8-bit quantized model, simply pass quantized = True
, default is False
.
We can expect slightly accuracy drop from quantized model, and not necessary faster than normal 32-bit float model, totally depends on machine.
[3]:
quantized_model = malaya.emotion.transformer(model = 'albert', quantized = True)
WARNING:root:Load quantized model will cause accuracy drop.
WARNING:tensorflow:From /Users/huseinzolkepli/Documents/Malaya/malaya/function/__init__.py:74: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.
WARNING:tensorflow:From /Users/huseinzolkepli/Documents/Malaya/malaya/function/__init__.py:74: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.
WARNING:tensorflow:From /Users/huseinzolkepli/Documents/Malaya/malaya/function/__init__.py:76: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.
WARNING:tensorflow:From /Users/huseinzolkepli/Documents/Malaya/malaya/function/__init__.py:76: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.
WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/albert/tokenization.py:240: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.
WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/albert/tokenization.py:240: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.
INFO:tensorflow:loading sentence piece model
INFO:tensorflow:loading sentence piece model
WARNING:tensorflow:From /Users/huseinzolkepli/Documents/Malaya/malaya/function/__init__.py:69: The name tf.InteractiveSession is deprecated. Please use tf.compat.v1.InteractiveSession instead.
WARNING:tensorflow:From /Users/huseinzolkepli/Documents/Malaya/malaya/function/__init__.py:69: The name tf.InteractiveSession is deprecated. Please use tf.compat.v1.InteractiveSession instead.
Predict batch of strings¶
[16]:
model.predict_proba(
[anger_text, fear_text, happy_text, love_text, sadness_text, surprise_text]
)
[16]:
[{'anger': 0.9998901,
'fear': 3.2524113e-05,
'happy': 2.620931e-05,
'love': 2.2871463e-05,
'sadness': 9.782951e-06,
'surprise': 1.8502667e-05},
{'anger': 1.6941378e-05,
'fear': 0.9999205,
'happy': 9.070281e-06,
'love': 2.044179e-05,
'sadness': 6.7731107e-06,
'surprise': 2.6314676e-05},
{'anger': 0.15370166,
'fear': 0.0013852724,
'happy': 0.8268689,
'love': 0.011433229,
'sadness': 0.0011807577,
'surprise': 0.005430276},
{'anger': 1.2597201e-05,
'fear': 1.7600481e-05,
'happy': 9.667115e-06,
'love': 0.9999331,
'sadness': 1.3735416e-05,
'surprise': 1.3399296e-05},
{'anger': 1.9176923e-05,
'fear': 1.1163729e-05,
'happy': 6.353941e-06,
'love': 7.004002e-06,
'sadness': 0.99994576,
'surprise': 1.0511084e-05},
{'anger': 5.8739704e-05,
'fear': 1.9771342e-05,
'happy': 1.8316741e-05,
'love': 2.2319455e-05,
'sadness': 3.646786e-05,
'surprise': 0.9998443}]
[4]:
quantized_model.predict_proba(
[anger_text, fear_text, happy_text, love_text, sadness_text, surprise_text]
)
[4]:
[{'anger': 0.99988353,
'fear': 3.5938003e-05,
'happy': 2.7778764e-05,
'love': 2.3541537e-05,
'sadness': 9.574292e-06,
'surprise': 1.9607493e-05},
{'anger': 1.6855265e-05,
'fear': 0.9999219,
'happy': 9.185196e-06,
'love': 2.0216348e-05,
'sadness': 6.6679663e-06,
'surprise': 2.5186611e-05},
{'anger': 0.22842072,
'fear': 0.001628682,
'happy': 0.7477462,
'love': 0.014303649,
'sadness': 0.0013838055,
'surprise': 0.00651699},
{'anger': 1.28296715e-05,
'fear': 1.7833345e-05,
'happy': 9.577061e-06,
'love': 0.9999324,
'sadness': 1.3832815e-05,
'surprise': 1.34745715e-05},
{'anger': 1.9776813e-05,
'fear': 1.1116885e-05,
'happy': 6.3422367e-06,
'love': 6.905633e-06,
'sadness': 0.9999455,
'surprise': 1.0316757e-05},
{'anger': 5.8218586e-05,
'fear': 2.07504e-05,
'happy': 1.8061248e-05,
'love': 2.1852256e-05,
'sadness': 3.5944133e-05,
'surprise': 0.99984515}]
Open emotion visualization dashboard¶
Default when you call predict_words
it will open a browser with visualization dashboard, you can disable by visualization=False
.
[ ]:
model.predict_words(sadness_text)
[18]:
from IPython.core.display import Image, display
display(Image('emotion-dashboard.png', width=800))

Vectorize¶
Let say you want to visualize sentence / word level in lower dimension, you can use model.vectorize
,
def vectorize(self, strings: List[str], method: str = 'first'):
"""
vectorize list of strings.
Parameters
----------
strings: List[str]
method : str, optional (default='first')
Vectorization layer supported. Allowed values:
* ``'last'`` - vector from last sequence.
* ``'first'`` - vector from first sequence.
* ``'mean'`` - average vectors from all sequences.
* ``'word'`` - average vectors based on tokens.
Returns
-------
result: np.array
"""
Sentence level¶
[5]:
texts = [anger_text, fear_text, happy_text, love_text, sadness_text, surprise_text]
r = quantized_model.vectorize(texts, method = 'first')
[6]:
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
tsne = TSNE().fit_transform(r)
tsne.shape
[6]:
(6, 2)
[8]:
plt.figure(figsize = (7, 7))
plt.scatter(tsne[:, 0], tsne[:, 1])
labels = texts
for label, x, y in zip(
labels, tsne[:, 0], tsne[:, 1]
):
label = (
'%s, %.3f' % (label[0], label[1])
if isinstance(label, list)
else label
)
plt.annotate(
label,
xy = (x, y),
xytext = (0, 0),
textcoords = 'offset points',
)

Word level¶
[9]:
r = quantized_model.vectorize(texts, method = 'word')
[10]:
x, y = [], []
for row in r:
x.extend([i[0] for i in row])
y.extend([i[1] for i in row])
[11]:
tsne = TSNE().fit_transform(y)
tsne.shape
[11]:
(49, 2)
[12]:
plt.figure(figsize = (7, 7))
plt.scatter(tsne[:, 0], tsne[:, 1])
labels = x
for label, x, y in zip(
labels, tsne[:, 0], tsne[:, 1]
):
label = (
'%s, %.3f' % (label[0], label[1])
if isinstance(label, list)
else label
)
plt.annotate(
label,
xy = (x, y),
xytext = (0, 0),
textcoords = 'offset points',
)

Pretty good, the model able to know cluster top right as surprise emotion.
Stacking models¶
More information, you can read at https://malaya.readthedocs.io/en/latest/Stack.html
[4]:
multinomial = malaya.emotion.multinomial()
[6]:
malaya.stack.predict_stack([multinomial, model], [anger_text])
[6]:
[{'anger': 0.5739743139312979,
'fear': 0.002130791264743306,
'happy': 0.0019609404077070573,
'love': 0.0016901068202818533,
'sadness': 0.001121633002361737,
'surprise': 0.0015551851123993595}]
[7]:
malaya.stack.predict_stack([multinomial, model], [anger_text, sadness_text])
[7]:
[{'anger': 0.5739743139312979,
'fear': 0.002130791264743306,
'happy': 0.0019609404077070573,
'love': 0.0016901068202818533,
'sadness': 0.001121633002361737,
'surprise': 0.0015551858768478731},
{'anger': 0.0016541454680912208,
'fear': 0.0011659984542562358,
'happy': 0.001014179551389293,
'love': 0.0008495638318424924,
'sadness': 0.5854571761989077,
'surprise': 0.001159149836587787}]
[ ]: