Dependency Parsing
Contents
Dependency Parsing#
This tutorial is available as an IPython notebook at Malaya/example/dependency.
This module only trained on standard language structure, so it is not save to use it for local language structure.
This interface deprecated, use HuggingFace interface instead.
[1]:
import logging
logging.basicConfig(level=logging.INFO)
[2]:
%%time
import malaya
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
CPU times: user 5.07 s, sys: 712 ms, total: 5.78 s
Wall time: 5.35 s
Models accuracy#
We use sklearn.metrics.classification_report
for accuracy reporting, check at https://malaya.readthedocs.io/en/latest/models-accuracy.html#dependency-parsing
Describe supported dependencies#
[3]:
malaya.dependency.describe()
INFO:malaya_boilerplate.utils:you can read more from https://universaldependencies.org/treebanks/id_pud/index.html
[3]:
Tag | Description | |
---|---|---|
0 | acl | clausal modifier of noun |
1 | advcl | adverbial clause modifier |
2 | advmod | adverbial modifier |
3 | amod | adjectival modifier |
4 | appos | appositional modifier |
5 | aux | auxiliary |
6 | case | case marking |
7 | ccomp | clausal complement |
8 | compound | compound |
9 | compound:plur | plural compound |
10 | conj | conjunct |
11 | cop | cop |
12 | csubj | clausal subject |
13 | dep | dependent |
14 | det | determiner |
15 | fixed | multi-word expression |
16 | flat | name |
17 | iobj | indirect object |
18 | mark | marker |
19 | nmod | nominal modifier |
20 | nsubj | nominal subject |
21 | obj | direct object |
22 | parataxis | parataxis |
23 | root | root |
24 | xcomp | open clausal complement |
List available transformer Dependency models#
def available_transformer(version: str = 'v2'):
"""
List available transformer dependency parsing models.
Parameters
----------
version : str, optional (default='v2')
Version supported. Allowed values:
* ``'v1'`` - version 1, maintain for knowledge graph.
* ``'v2'`` - Trained on bigger dataset, better version.
"""
[4]:
malaya.dependency.available_transformer()
INFO:malaya.dependency:tested on test set at https://github.com/huseinzol05/malay-dataset/tree/master/parsing/dependency
[4]:
Size (MB) | Quantized Size (MB) | Arc Accuracy | Types Accuracy | Root Accuracy | |
---|---|---|---|---|---|
bert | 455.0 | 114.00 | 0.820450 | 0.79970 | 0.98936 |
tiny-bert | 69.7 | 17.50 | 0.795252 | 0.72470 | 0.98939 |
albert | 60.8 | 15.30 | 0.821895 | 0.79752 | 1.00000 |
tiny-albert | 33.4 | 8.51 | 0.786500 | 0.75870 | 1.00000 |
xlnet | 480.2 | 121.00 | 0.848110 | 0.82741 | 0.92101 |
alxlnet | 61.2 | 16.40 | 0.849290 | 0.82810 | 0.92099 |
Load xlnet dependency model#
def transformer(version: str = 'v2', model: str = 'xlnet', quantized: bool = False, **kwargs):
"""
Load Transformer Dependency Parsing model, transfer learning Transformer + biaffine attention.
Parameters
----------
version : str, optional (default='v2')
Version supported. Allowed values:
* ``'v1'`` - version 1, maintain for knowledge graph.
* ``'v2'`` - Trained on bigger dataset, better version.
model : str, optional (default='xlnet')
Model architecture supported. Allowed values:
* ``'bert'`` - Google BERT BASE parameters.
* ``'tiny-bert'`` - Google BERT TINY parameters.
* ``'albert'`` - Google ALBERT BASE parameters.
* ``'tiny-albert'`` - Google ALBERT TINY parameters.
* ``'xlnet'`` - Google XLNET BASE parameters.
* ``'alxlnet'`` - Malaya ALXLNET BASE parameters.
quantized : bool, optional (default=False)
if True, will load 8-bit quantized model.
Quantized model not necessary faster, totally depends on the machine.
Returns
-------
result: model
List of model classes:
* if `bert` in model, will return `malaya.model.bert.DependencyBERT`.
* if `xlnet` in model, will return `malaya.model.xlnet.DependencyXLNET`.
"""
[4]:
model = malaya.dependency.transformer(model = 'albert')
INFO:root:running dependency-v2/albert using device /device:CPU:0
Load Quantized model#
To load 8-bit quantized model, simply pass quantized = True
, default is False
.
We can expect slightly accuracy drop from quantized model, and not necessary faster than normal 32-bit float model, totally depends on machine.
[5]:
quantized_model = malaya.dependency.transformer(model = 'albert', quantized = True)
WARNING:root:Load quantized model will cause accuracy drop.
INFO:root:running dependency-v2/albert-quantized using device /device:CPU:0
Predict#
def predict(self, string: str):
"""
Tag a string.
Parameters
----------
string: str
Returns
-------
result: Tuple
"""
[6]:
string = 'Dr Mahathir menasihati mereka supaya berhenti berehat dan tidur sebentar sekiranya mengantuk ketika memandu.'
[7]:
d_object, tagging, indexing = model.predict(string)
d_object.to_graphvis()
[7]:
[8]:
d_object, tagging, indexing = quantized_model.predict(string)
d_object.to_graphvis()
[8]:
Voting stack model#
[10]:
alxlnet = malaya.dependency.transformer(model = 'alxlnet')
tagging, indexing = malaya.stack.voting_stack([model, model, alxlnet], string)
malaya.dependency.dependency_graph(tagging, indexing).to_graphvis()
INFO:root:running dependency-v2/alxlnet using device /device:CPU:0
[10]:
Harder example#
[13]:
# https://www.astroawani.com/berita-malaysia/terbaik-tun-kita-geng-najib-razak-puji-tun-m-297884
s = """
KUALA LUMPUR: Dalam hal politik, jarang sekali untuk melihat dua figura ini - bekas Perdana Menteri, Datuk Seri Najib Razak dan Tun Dr Mahathir Mohamad mempunyai 'pandangan yang sama' atau sekapal. Namun, situasi itu berbeza apabila melibatkan isu ketidakpatuhan terhadap prosedur operasi standard (SOP). Najib, yang juga Ahli Parlimen Pekan memuji sikap Ahli Parlimen Langkawi itu yang mengaku bersalah selepas melanggar SOP kerana tidak mengambil suhu badan ketika masuk ke sebuah surau di Langkawi pada Sabtu lalu.
"""
[14]:
d_object, tagging, indexing = model.predict(s)
d_object.to_graphvis()
[14]:
[15]:
tagging, indexing = malaya.stack.voting_stack([model, model, alxlnet], s)
malaya.dependency.dependency_graph(tagging, indexing).to_graphvis()
[15]:
Dependency graph object#
To initiate a dependency graph from dependency models, you need to call malaya.dependency.dependency_graph
.
[16]:
graph = malaya.dependency.dependency_graph(tagging, indexing)
graph
[16]:
<malaya.function.parse_dependency.DependencyGraph at 0x16ab39c10>
Get nodes#
[17]:
graph.nodes
[17]:
defaultdict(<function malaya.function.parse_dependency.DependencyGraph.__init__.<locals>.<lambda>()>,
{0: {'address': 0,
'word': None,
'lemma': None,
'ctag': 'TOP',
'tag': 'TOP',
'feats': None,
'head': None,
'deps': defaultdict(list, {'root': [11]}),
'rel': None},
1: {'address': 1,
'word': 'KUALA',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 11,
'deps': defaultdict(list,
{'flat': [2], 'obl': [5], 'punct': [7]}),
'rel': 'nsubj'},
11: {'address': 11,
'word': 'melihat',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 0,
'deps': defaultdict(list,
{'nsubj': [1],
'advmod': [8, 9],
'case': [10],
'advcl': [29],
'dep': [42]}),
'rel': 'root'},
2: {'address': 2,
'word': 'LUMPUR',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 1,
'deps': defaultdict(list, {}),
'rel': 'flat'},
3: {'address': 3,
'word': ':',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 5,
'deps': defaultdict(list, {}),
'rel': 'punct'},
5: {'address': 5,
'word': 'hal',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 1,
'deps': defaultdict(list,
{'punct': [3], 'case': [4], 'compound': [6]}),
'rel': 'obl'},
4: {'address': 4,
'word': 'Dalam',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 5,
'deps': defaultdict(list, {}),
'rel': 'case'},
6: {'address': 6,
'word': 'politik',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 5,
'deps': defaultdict(list, {}),
'rel': 'compound'},
7: {'address': 7,
'word': ',',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 1,
'deps': defaultdict(list, {}),
'rel': 'punct'},
8: {'address': 8,
'word': 'jarang',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 11,
'deps': defaultdict(list, {}),
'rel': 'advmod'},
9: {'address': 9,
'word': 'sekali',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 11,
'deps': defaultdict(list, {}),
'rel': 'advmod'},
10: {'address': 10,
'word': 'untuk',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 11,
'deps': defaultdict(list, {}),
'rel': 'case'},
12: {'address': 12,
'word': 'dua',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 13,
'deps': defaultdict(list, {}),
'rel': 'nummod'},
13: {'address': 13,
'word': 'figura',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 29,
'deps': defaultdict(list,
{'nummod': [12],
'punct': [15],
'compound:plur': [16],
'flat': [17]}),
'rel': 'obj'},
29: {'address': 29,
'word': 'mempunyai',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 11,
'deps': defaultdict(list,
{'obj': [13, 31], 'punct': [37], 'mark': [38]}),
'rel': 'advcl'},
14: {'address': 14,
'word': 'ini',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 17,
'deps': defaultdict(list, {}),
'rel': 'det'},
17: {'address': 17,
'word': 'Perdana',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 13,
'deps': defaultdict(list,
{'det': [14],
'flat': [18],
'punct': [19],
'appos': [20],
'conj': [25]}),
'rel': 'flat'},
15: {'address': 15,
'word': '-',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 13,
'deps': defaultdict(list, {}),
'rel': 'punct'},
16: {'address': 16,
'word': 'bekas',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 13,
'deps': defaultdict(list, {}),
'rel': 'compound:plur'},
18: {'address': 18,
'word': 'Menteri',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 17,
'deps': defaultdict(list, {}),
'rel': 'flat'},
19: {'address': 19,
'word': ',',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 17,
'deps': defaultdict(list, {}),
'rel': 'punct'},
20: {'address': 20,
'word': 'Datuk',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 17,
'deps': defaultdict(list, {'flat': [21]}),
'rel': 'appos'},
21: {'address': 21,
'word': 'Seri',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 20,
'deps': defaultdict(list, {'flat': [22]}),
'rel': 'flat'},
22: {'address': 22,
'word': 'Najib',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 21,
'deps': defaultdict(list, {'flat': [23]}),
'rel': 'flat'},
23: {'address': 23,
'word': 'Razak',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 22,
'deps': defaultdict(list, {}),
'rel': 'flat'},
24: {'address': 24,
'word': 'dan',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 25,
'deps': defaultdict(list, {}),
'rel': 'cc'},
25: {'address': 25,
'word': 'Tun',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 17,
'deps': defaultdict(list, {'cc': [24], 'flat': [26]}),
'rel': 'conj'},
26: {'address': 26,
'word': 'Dr',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 25,
'deps': defaultdict(list, {'flat': [27]}),
'rel': 'flat'},
27: {'address': 27,
'word': 'Mahathir',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 26,
'deps': defaultdict(list, {'flat': [28]}),
'rel': 'flat'},
28: {'address': 28,
'word': 'Mohamad',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 27,
'deps': defaultdict(list, {}),
'rel': 'flat'},
30: {'address': 30,
'word': "'",
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 31,
'deps': defaultdict(list, {}),
'rel': 'punct'},
31: {'address': 31,
'word': 'pandangan',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 29,
'deps': defaultdict(list, {'punct': [30], 'amod': [33]}),
'rel': 'obj'},
32: {'address': 32,
'word': 'yang',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 36,
'deps': defaultdict(list, {}),
'rel': 'nsubj'},
36: {'address': 36,
'word': 'sekapal',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 33,
'deps': defaultdict(list,
{'nsubj': [32], 'punct': [34], 'cc': [35]}),
'rel': 'conj'},
33: {'address': 33,
'word': 'sama',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 31,
'deps': defaultdict(list, {'conj': [36]}),
'rel': 'amod'},
34: {'address': 34,
'word': "'",
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 36,
'deps': defaultdict(list, {}),
'rel': 'punct'},
35: {'address': 35,
'word': 'atau',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 36,
'deps': defaultdict(list, {}),
'rel': 'cc'},
37: {'address': 37,
'word': '.',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 29,
'deps': defaultdict(list, {}),
'rel': 'punct'},
38: {'address': 38,
'word': 'Namun',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 29,
'deps': defaultdict(list, {}),
'rel': 'mark'},
39: {'address': 39,
'word': ',',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 42,
'deps': defaultdict(list, {}),
'rel': 'punct'},
42: {'address': 42,
'word': 'berbeza',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 11,
'deps': defaultdict(list,
{'punct': [39, 54, 89],
'nsubj': [40],
'advcl': [44],
'dep': [55]}),
'rel': 'dep'},
40: {'address': 40,
'word': 'situasi',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 42,
'deps': defaultdict(list, {'det': [41]}),
'rel': 'nsubj'},
41: {'address': 41,
'word': 'itu',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 40,
'deps': defaultdict(list, {}),
'rel': 'det'},
43: {'address': 43,
'word': 'apabila',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 44,
'deps': defaultdict(list, {}),
'rel': 'mark'},
44: {'address': 44,
'word': 'melibatkan',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 42,
'deps': defaultdict(list, {'mark': [43], 'obj': [45]}),
'rel': 'advcl'},
45: {'address': 45,
'word': 'isu',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 44,
'deps': defaultdict(list, {'compound': [46], 'nmod': [48]}),
'rel': 'obj'},
46: {'address': 46,
'word': 'ketidakpatuhan',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 45,
'deps': defaultdict(list, {}),
'rel': 'compound'},
47: {'address': 47,
'word': 'terhadap',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 48,
'deps': defaultdict(list, {}),
'rel': 'case'},
48: {'address': 48,
'word': 'prosedur',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 45,
'deps': defaultdict(list,
{'case': [47],
'compound': [49],
'amod': [50],
'appos': [52]}),
'rel': 'nmod'},
49: {'address': 49,
'word': 'operasi',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 48,
'deps': defaultdict(list, {}),
'rel': 'compound'},
50: {'address': 50,
'word': 'standard',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 48,
'deps': defaultdict(list, {}),
'rel': 'amod'},
51: {'address': 51,
'word': '(',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 52,
'deps': defaultdict(list, {}),
'rel': 'punct'},
52: {'address': 52,
'word': 'SOP',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 48,
'deps': defaultdict(list, {'punct': [51, 53]}),
'rel': 'appos'},
53: {'address': 53,
'word': ')',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 52,
'deps': defaultdict(list, {}),
'rel': 'punct'},
54: {'address': 54,
'word': '.',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 42,
'deps': defaultdict(list, {}),
'rel': 'punct'},
55: {'address': 55,
'word': 'Najib',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 42,
'deps': defaultdict(list,
{'punct': [56], 'nsubj': [59], 'acl': [62]}),
'rel': 'dep'},
56: {'address': 56,
'word': ',',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 55,
'deps': defaultdict(list, {}),
'rel': 'punct'},
57: {'address': 57,
'word': 'yang',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 59,
'deps': defaultdict(list, {}),
'rel': 'nsubj'},
59: {'address': 59,
'word': 'Ahli',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 55,
'deps': defaultdict(list,
{'nsubj': [57], 'advmod': [58], 'flat': [60]}),
'rel': 'nsubj'},
58: {'address': 58,
'word': 'juga',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 59,
'deps': defaultdict(list, {}),
'rel': 'advmod'},
60: {'address': 60,
'word': 'Parlimen',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 59,
'deps': defaultdict(list, {'flat': [61]}),
'rel': 'flat'},
61: {'address': 61,
'word': 'Pekan',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 60,
'deps': defaultdict(list, {}),
'rel': 'flat'},
62: {'address': 62,
'word': 'memuji',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 55,
'deps': defaultdict(list, {'obj': [63]}),
'rel': 'acl'},
63: {'address': 63,
'word': 'sikap',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 62,
'deps': defaultdict(list, {'flat': [64], 'acl': [69]}),
'rel': 'obj'},
64: {'address': 64,
'word': 'Ahli',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 63,
'deps': defaultdict(list, {'flat': [65]}),
'rel': 'flat'},
65: {'address': 65,
'word': 'Parlimen',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 64,
'deps': defaultdict(list, {'flat': [66]}),
'rel': 'flat'},
66: {'address': 66,
'word': 'Langkawi',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 65,
'deps': defaultdict(list, {'det': [67]}),
'rel': 'flat'},
67: {'address': 67,
'word': 'itu',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 66,
'deps': defaultdict(list, {}),
'rel': 'det'},
68: {'address': 68,
'word': 'yang',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 69,
'deps': defaultdict(list, {}),
'rel': 'nsubj'},
69: {'address': 69,
'word': 'mengaku',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 63,
'deps': defaultdict(list, {'nsubj': [68], 'xcomp': [70]}),
'rel': 'acl'},
70: {'address': 70,
'word': 'bersalah',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 69,
'deps': defaultdict(list, {'xcomp': [72]}),
'rel': 'xcomp'},
71: {'address': 71,
'word': 'selepas',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 72,
'deps': defaultdict(list, {}),
'rel': 'case'},
72: {'address': 72,
'word': 'melanggar',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 70,
'deps': defaultdict(list,
{'case': [71], 'obj': [73], 'advcl': [76]}),
'rel': 'xcomp'},
73: {'address': 73,
'word': 'SOP',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 72,
'deps': defaultdict(list, {}),
'rel': 'obj'},
74: {'address': 74,
'word': 'kerana',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 76,
'deps': defaultdict(list, {}),
'rel': 'mark'},
76: {'address': 76,
'word': 'mengambil',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 72,
'deps': defaultdict(list,
{'mark': [74],
'advmod': [75],
'obj': [77],
'advcl': [80]}),
'rel': 'advcl'},
75: {'address': 75,
'word': 'tidak',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 76,
'deps': defaultdict(list, {}),
'rel': 'advmod'},
77: {'address': 77,
'word': 'suhu',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 76,
'deps': defaultdict(list, {'compound': [78]}),
'rel': 'obj'},
78: {'address': 78,
'word': 'badan',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 77,
'deps': defaultdict(list, {}),
'rel': 'compound'},
79: {'address': 79,
'word': 'ketika',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 80,
'deps': defaultdict(list, {}),
'rel': 'mark'},
80: {'address': 80,
'word': 'masuk',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 76,
'deps': defaultdict(list, {'mark': [79], 'obl': [83, 85, 87]}),
'rel': 'advcl'},
81: {'address': 81,
'word': 'ke',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 83,
'deps': defaultdict(list, {}),
'rel': 'case'},
83: {'address': 83,
'word': 'surau',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 80,
'deps': defaultdict(list, {'case': [81], 'det': [82]}),
'rel': 'obl'},
82: {'address': 82,
'word': 'sebuah',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 83,
'deps': defaultdict(list, {}),
'rel': 'det'},
84: {'address': 84,
'word': 'di',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 85,
'deps': defaultdict(list, {}),
'rel': 'case'},
85: {'address': 85,
'word': 'Langkawi',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 80,
'deps': defaultdict(list, {'case': [84]}),
'rel': 'obl'},
86: {'address': 86,
'word': 'pada',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 87,
'deps': defaultdict(list, {}),
'rel': 'case'},
87: {'address': 87,
'word': 'Sabtu',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 80,
'deps': defaultdict(list, {'case': [86], 'amod': [88]}),
'rel': 'obl'},
88: {'address': 88,
'word': 'lalu',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 87,
'deps': defaultdict(list, {}),
'rel': 'amod'},
89: {'address': 89,
'word': '.',
'lemma': '_',
'ctag': '_',
'tag': '_',
'feats': '_',
'head': 42,
'deps': defaultdict(list, {}),
'rel': 'punct'}})
Flat the graph#
[20]:
list(graph.triples())
[20]:
[(('melihat', '_'), 'nsubj', ('KUALA', '_')),
(('KUALA', '_'), 'flat', ('LUMPUR', '_')),
(('KUALA', '_'), 'obl', ('hal', '_')),
(('hal', '_'), 'punct', (':', '_')),
(('hal', '_'), 'case', ('Dalam', '_')),
(('hal', '_'), 'compound', ('politik', '_')),
(('KUALA', '_'), 'punct', (',', '_')),
(('melihat', '_'), 'advmod', ('jarang', '_')),
(('melihat', '_'), 'advmod', ('sekali', '_')),
(('melihat', '_'), 'case', ('untuk', '_')),
(('melihat', '_'), 'advcl', ('mempunyai', '_')),
(('mempunyai', '_'), 'obj', ('figura', '_')),
(('figura', '_'), 'nummod', ('dua', '_')),
(('figura', '_'), 'punct', ('-', '_')),
(('figura', '_'), 'compound:plur', ('bekas', '_')),
(('figura', '_'), 'flat', ('Perdana', '_')),
(('Perdana', '_'), 'det', ('ini', '_')),
(('Perdana', '_'), 'flat', ('Menteri', '_')),
(('Perdana', '_'), 'punct', (',', '_')),
(('Perdana', '_'), 'appos', ('Datuk', '_')),
(('Datuk', '_'), 'flat', ('Seri', '_')),
(('Seri', '_'), 'flat', ('Najib', '_')),
(('Najib', '_'), 'flat', ('Razak', '_')),
(('Perdana', '_'), 'conj', ('Tun', '_')),
(('Tun', '_'), 'cc', ('dan', '_')),
(('Tun', '_'), 'flat', ('Dr', '_')),
(('Dr', '_'), 'flat', ('Mahathir', '_')),
(('Mahathir', '_'), 'flat', ('Mohamad', '_')),
(('mempunyai', '_'), 'obj', ('pandangan', '_')),
(('pandangan', '_'), 'punct', ("'", '_')),
(('pandangan', '_'), 'amod', ('sama', '_')),
(('sama', '_'), 'conj', ('sekapal', '_')),
(('sekapal', '_'), 'nsubj', ('yang', '_')),
(('sekapal', '_'), 'punct', ("'", '_')),
(('sekapal', '_'), 'cc', ('atau', '_')),
(('mempunyai', '_'), 'punct', ('.', '_')),
(('mempunyai', '_'), 'mark', ('Namun', '_')),
(('melihat', '_'), 'dep', ('berbeza', '_')),
(('berbeza', '_'), 'punct', (',', '_')),
(('berbeza', '_'), 'nsubj', ('situasi', '_')),
(('situasi', '_'), 'det', ('itu', '_')),
(('berbeza', '_'), 'advcl', ('melibatkan', '_')),
(('melibatkan', '_'), 'mark', ('apabila', '_')),
(('melibatkan', '_'), 'obj', ('isu', '_')),
(('isu', '_'), 'compound', ('ketidakpatuhan', '_')),
(('isu', '_'), 'nmod', ('prosedur', '_')),
(('prosedur', '_'), 'case', ('terhadap', '_')),
(('prosedur', '_'), 'compound', ('operasi', '_')),
(('prosedur', '_'), 'amod', ('standard', '_')),
(('prosedur', '_'), 'appos', ('SOP', '_')),
(('SOP', '_'), 'punct', ('(', '_')),
(('SOP', '_'), 'punct', (')', '_')),
(('berbeza', '_'), 'punct', ('.', '_')),
(('berbeza', '_'), 'dep', ('Najib', '_')),
(('Najib', '_'), 'punct', (',', '_')),
(('Najib', '_'), 'nsubj', ('Ahli', '_')),
(('Ahli', '_'), 'nsubj', ('yang', '_')),
(('Ahli', '_'), 'advmod', ('juga', '_')),
(('Ahli', '_'), 'flat', ('Parlimen', '_')),
(('Parlimen', '_'), 'flat', ('Pekan', '_')),
(('Najib', '_'), 'acl', ('memuji', '_')),
(('memuji', '_'), 'obj', ('sikap', '_')),
(('sikap', '_'), 'flat', ('Ahli', '_')),
(('Ahli', '_'), 'flat', ('Parlimen', '_')),
(('Parlimen', '_'), 'flat', ('Langkawi', '_')),
(('Langkawi', '_'), 'det', ('itu', '_')),
(('sikap', '_'), 'acl', ('mengaku', '_')),
(('mengaku', '_'), 'nsubj', ('yang', '_')),
(('mengaku', '_'), 'xcomp', ('bersalah', '_')),
(('bersalah', '_'), 'xcomp', ('melanggar', '_')),
(('melanggar', '_'), 'case', ('selepas', '_')),
(('melanggar', '_'), 'obj', ('SOP', '_')),
(('melanggar', '_'), 'advcl', ('mengambil', '_')),
(('mengambil', '_'), 'mark', ('kerana', '_')),
(('mengambil', '_'), 'advmod', ('tidak', '_')),
(('mengambil', '_'), 'obj', ('suhu', '_')),
(('suhu', '_'), 'compound', ('badan', '_')),
(('mengambil', '_'), 'advcl', ('masuk', '_')),
(('masuk', '_'), 'mark', ('ketika', '_')),
(('masuk', '_'), 'obl', ('surau', '_')),
(('surau', '_'), 'case', ('ke', '_')),
(('surau', '_'), 'det', ('sebuah', '_')),
(('masuk', '_'), 'obl', ('Langkawi', '_')),
(('Langkawi', '_'), 'case', ('di', '_')),
(('masuk', '_'), 'obl', ('Sabtu', '_')),
(('Sabtu', '_'), 'case', ('pada', '_')),
(('Sabtu', '_'), 'amod', ('lalu', '_')),
(('berbeza', '_'), 'punct', ('.', '_'))]
Generate networkx#
Make sure you already installed networkx,
pip install networkx
[22]:
digraph = graph.to_networkx()
digraph
[22]:
<networkx.classes.multidigraph.MultiDiGraph at 0x16ab39d50>
[23]:
import networkx as nx
import matplotlib.pyplot as plt
nx.draw_networkx(digraph)
plt.show()

[24]:
digraph.edges()
[24]:
OutMultiEdgeDataView([(1, 11), (2, 1), (3, 5), (4, 5), (5, 1), (6, 5), (7, 1), (8, 11), (9, 11), (10, 11), (12, 13), (13, 29), (14, 17), (15, 13), (16, 13), (17, 13), (18, 17), (19, 17), (20, 17), (21, 20), (22, 21), (23, 22), (24, 25), (25, 17), (26, 25), (27, 26), (28, 27), (29, 11), (30, 31), (31, 29), (32, 36), (33, 31), (34, 36), (35, 36), (36, 33), (37, 29), (38, 29), (39, 42), (40, 42), (41, 40), (42, 11), (43, 44), (44, 42), (45, 44), (46, 45), (47, 48), (48, 45), (49, 48), (50, 48), (51, 52), (52, 48), (53, 52), (54, 42), (55, 42), (56, 55), (57, 59), (58, 59), (59, 55), (60, 59), (61, 60), (62, 55), (63, 62), (64, 63), (65, 64), (66, 65), (67, 66), (68, 69), (69, 63), (70, 69), (71, 72), (72, 70), (73, 72), (74, 76), (75, 76), (76, 72), (77, 76), (78, 77), (79, 80), (80, 76), (81, 83), (82, 83), (83, 80), (84, 85), (85, 80), (86, 87), (87, 80), (88, 87), (89, 42)])
[25]:
digraph.nodes()
[25]:
NodeView((1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89))
[26]:
labels = {i:graph.get_by_address(i)['word'] for i in digraph.nodes()}
labels
[26]:
{1: 'KUALA',
2: 'LUMPUR',
3: ':',
4: 'Dalam',
5: 'hal',
6: 'politik',
7: ',',
8: 'jarang',
9: 'sekali',
10: 'untuk',
11: 'melihat',
12: 'dua',
13: 'figura',
14: 'ini',
15: '-',
16: 'bekas',
17: 'Perdana',
18: 'Menteri',
19: ',',
20: 'Datuk',
21: 'Seri',
22: 'Najib',
23: 'Razak',
24: 'dan',
25: 'Tun',
26: 'Dr',
27: 'Mahathir',
28: 'Mohamad',
29: 'mempunyai',
30: "'",
31: 'pandangan',
32: 'yang',
33: 'sama',
34: "'",
35: 'atau',
36: 'sekapal',
37: '.',
38: 'Namun',
39: ',',
40: 'situasi',
41: 'itu',
42: 'berbeza',
43: 'apabila',
44: 'melibatkan',
45: 'isu',
46: 'ketidakpatuhan',
47: 'terhadap',
48: 'prosedur',
49: 'operasi',
50: 'standard',
51: '(',
52: 'SOP',
53: ')',
54: '.',
55: 'Najib',
56: ',',
57: 'yang',
58: 'juga',
59: 'Ahli',
60: 'Parlimen',
61: 'Pekan',
62: 'memuji',
63: 'sikap',
64: 'Ahli',
65: 'Parlimen',
66: 'Langkawi',
67: 'itu',
68: 'yang',
69: 'mengaku',
70: 'bersalah',
71: 'selepas',
72: 'melanggar',
73: 'SOP',
74: 'kerana',
75: 'tidak',
76: 'mengambil',
77: 'suhu',
78: 'badan',
79: 'ketika',
80: 'masuk',
81: 'ke',
82: 'sebuah',
83: 'surau',
84: 'di',
85: 'Langkawi',
86: 'pada',
87: 'Sabtu',
88: 'lalu',
89: '.'}
[27]:
plt.figure(figsize=(15,5))
nx.draw_networkx(digraph,labels=labels)
plt.show()

Vectorize#
Let say you want to visualize word level in lower dimension, you can use model.vectorize
,
def vectorize(self, string: str):
"""
vectorize a string.
Parameters
----------
string: List[str]
Returns
-------
result: np.array
"""
[28]:
r = quantized_model.vectorize(s)
[29]:
x = [i[0] for i in r]
y = [i[1] for i in r]
[30]:
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
tsne = TSNE().fit_transform(y)
tsne.shape
[30]:
(89, 2)
[31]:
plt.figure(figsize = (7, 7))
plt.scatter(tsne[:, 0], tsne[:, 1])
labels = x
for label, x, y in zip(
labels, tsne[:, 0], tsne[:, 1]
):
label = (
'%s, %.3f' % (label[0], label[1])
if isinstance(label, list)
else label
)
plt.annotate(
label,
xy = (x, y),
xytext = (0, 0),
textcoords = 'offset points',
)
