Stacking
Contents
Stacking#
This tutorial is available as an IPython notebook at Malaya/example/stacking.
Why Stacking?#
Sometime a single model is not good enough. So, you need to use multiple models to get a better result! It called stacking.
[1]:
%%time
import malaya
CPU times: user 5.67 s, sys: 1.34 s, total: 7 s
Wall time: 7.97 s
/Users/huseinzolkepli/Documents/Malaya/malaya/preprocessing.py:259: FutureWarning: Possible nested set at position 2289
self.tok = re.compile(r'({})'.format('|'.join(pipeline)))
[3]:
albert = malaya.sentiment.transformer('albert', quantized = True)
alxlnet = malaya.sentiment.transformer('alxlnet', quantized = True)
multinomial = malaya.sentiment.multinomial()
WARNING:root:Load quantized model will cause accuracy drop.
INFO:tensorflow:loading sentence piece model
INFO:tensorflow:loading sentence piece model
WARNING:root:Load quantized model will cause accuracy drop.
Stack multiple sentiment models#
malaya.stack.predict_stack
provide an easy stacking solution for Malaya models. Well, not just for sentiment models, any classification models can use malaya.stack.predict_stack
.
def predict_stack(
models, strings: List[str], aggregate: Callable = gmean, **kwargs
):
"""
Stacking for predictive models.
Parameters
----------
models: List[Callable]
list of models.
strings: List[str]
aggregate : Callable, optional (default=scipy.stats.mstats.gmean)
Aggregate function.
Returns
-------
result: dict
"""
[4]:
malaya.stack.predict_stack([albert, multinomial, alxlnet],
['harga minyak tak menentu'])
[4]:
[{'negative': 0.5016266912464752,
'positive': 4.4445397894955644e-05,
'neutral': 0.004399656207132555}]
To disable neutral
, simply, add_neutral = False
.
[5]:
malaya.stack.predict_stack([albert, multinomial, alxlnet],
['harga minyak tak menentu'], add_neutral = False)
[5]:
[{'negative': 0.8257116478969977, 'positive': 0.0016922961136002735}]
Stack tagging models#
For tagging models, we use majority voting stacking. So you need to need have more than 2 models to make it perfect, or else, it will pick randomly from 2 models. malaya.stack.voting_stack
provides easy interface for this kind of stacking. But only can use for Entites, POS and Dependency Parsing recognition.
def voting_stack(models, text):
"""
Stacking for POS and Entities Recognition models.
Parameters
----------
models: list
list of models
text: str
string to predict
Returns
-------
result: list
"""
[9]:
string = 'KUALA LUMPUR: Sempena sambutan Aidilfitri minggu depan, Perdana Menteri Tun Dr Mahathir Mohamad dan Menteri Pengangkutan Anthony Loke Siew Fook menitipkan pesanan khas kepada orang ramai yang mahu pulang ke kampung halaman masing-masing. Dalam video pendek terbitan Jabatan Keselamatan Jalan Raya (JKJR) itu, Dr Mahathir menasihati mereka supaya berhenti berehat dan tidur sebentar sekiranya mengantuk ketika memandu.'
albert = malaya.pos.transformer('albert')
bert = malaya.pos.transformer('bert')
malaya.stack.voting_stack([albert, bert], string)
[9]:
[('Kuala', 'PROPN'),
('Lumpur:', 'PROPN'),
('Sempena', 'ADP'),
('sambutan', 'NOUN'),
('Aidilfitri', 'PROPN'),
('minggu', 'NOUN'),
('depan,', 'ADJ'),
('Perdana', 'PROPN'),
('Menteri', 'PROPN'),
('Tun', 'PROPN'),
('Dr', 'PROPN'),
('Mahathir', 'PROPN'),
('Mohamad', 'PROPN'),
('dan', 'CCONJ'),
('Menteri', 'PROPN'),
('Pengangkutan', 'PROPN'),
('Anthony', 'PROPN'),
('Loke', 'PROPN'),
('Siew', 'PROPN'),
('Fook', 'PROPN'),
('menitipkan', 'VERB'),
('pesanan', 'NOUN'),
('khas', 'ADJ'),
('kepada', 'ADP'),
('orang', 'NOUN'),
('ramai', 'ADJ'),
('yang', 'PRON'),
('mahu', 'ADV'),
('pulang', 'VERB'),
('ke', 'ADP'),
('kampung', 'NOUN'),
('halaman', 'NOUN'),
('masing-masing.', 'DET'),
('Dalam', 'ADP'),
('video', 'NOUN'),
('pendek', 'ADJ'),
('terbitan', 'NOUN'),
('Jabatan', 'PROPN'),
('Keselamatan', 'PROPN'),
('Jalan', 'PROPN'),
('Raya', 'PROPN'),
('(JKJR)', 'PUNCT'),
('itu,', 'DET'),
('Dr', 'PROPN'),
('Mahathir', 'PROPN'),
('menasihati', 'VERB'),
('mereka', 'PRON'),
('supaya', 'SCONJ'),
('berhenti', 'VERB'),
('berehat', 'VERB'),
('dan', 'CCONJ'),
('tidur', 'VERB'),
('sebentar', 'ADV'),
('sekiranya', 'SCONJ'),
('mengantuk', 'NOUN'),
('ketika', 'SCONJ'),
('memandu.', 'VERB')]
[10]:
string = 'KUALA LUMPUR: Sempena sambutan Aidilfitri minggu depan, Perdana Menteri Tun Dr Mahathir Mohamad dan Menteri Pengangkutan Anthony Loke Siew Fook menitipkan pesanan khas kepada orang ramai yang mahu pulang ke kampung halaman masing-masing. Dalam video pendek terbitan Jabatan Keselamatan Jalan Raya (JKJR) itu, Dr Mahathir menasihati mereka supaya berhenti berehat dan tidur sebentar sekiranya mengantuk ketika memandu.'
xlnet = malaya.dependency.transformer(model = 'xlnet')
alxlnet = malaya.dependency.transformer(model = 'alxlnet')
[11]:
tagging, indexing = malaya.stack.voting_stack([xlnet, xlnet, alxlnet], string)
malaya.dependency.dependency_graph(tagging, indexing).to_graphvis()
[11]:
[ ]: