Dependency Parsing#

This tutorial is available as an IPython notebook at Malaya/example/dependency.

This module only trained on standard language structure, so it is not save to use it for local language structure.

This interface deprecated, use HuggingFace interface instead.

[1]:
import logging

logging.basicConfig(level=logging.INFO)
[2]:
%%time
import malaya
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
CPU times: user 5.07 s, sys: 712 ms, total: 5.78 s
Wall time: 5.35 s

Models accuracy#

We use sklearn.metrics.classification_report for accuracy reporting, check at https://malaya.readthedocs.io/en/latest/models-accuracy.html#dependency-parsing

Describe supported dependencies#

[3]:
malaya.dependency.describe()
INFO:malaya_boilerplate.utils:you can read more from https://universaldependencies.org/treebanks/id_pud/index.html
[3]:
Tag Description
0 acl clausal modifier of noun
1 advcl adverbial clause modifier
2 advmod adverbial modifier
3 amod adjectival modifier
4 appos appositional modifier
5 aux auxiliary
6 case case marking
7 ccomp clausal complement
8 compound compound
9 compound:plur plural compound
10 conj conjunct
11 cop cop
12 csubj clausal subject
13 dep dependent
14 det determiner
15 fixed multi-word expression
16 flat name
17 iobj indirect object
18 mark marker
19 nmod nominal modifier
20 nsubj nominal subject
21 obj direct object
22 parataxis parataxis
23 root root
24 xcomp open clausal complement

List available transformer Dependency models#

def available_transformer(version: str = 'v2'):
    """
    List available transformer dependency parsing models.

    Parameters
    ----------
    version : str, optional (default='v2')
        Version supported. Allowed values:

        * ``'v1'`` - version 1, maintain for knowledge graph.
        * ``'v2'`` - Trained on bigger dataset, better version.

    """
[4]:
malaya.dependency.available_transformer()
INFO:malaya.dependency:tested on test set at https://github.com/huseinzol05/malay-dataset/tree/master/parsing/dependency
[4]:
Size (MB) Quantized Size (MB) Arc Accuracy Types Accuracy Root Accuracy
bert 455.0 114.00 0.820450 0.79970 0.98936
tiny-bert 69.7 17.50 0.795252 0.72470 0.98939
albert 60.8 15.30 0.821895 0.79752 1.00000
tiny-albert 33.4 8.51 0.786500 0.75870 1.00000
xlnet 480.2 121.00 0.848110 0.82741 0.92101
alxlnet 61.2 16.40 0.849290 0.82810 0.92099

Load xlnet dependency model#

def transformer(version: str = 'v2', model: str = 'xlnet', quantized: bool = False, **kwargs):
    """
    Load Transformer Dependency Parsing model, transfer learning Transformer + biaffine attention.

    Parameters
    ----------
    version : str, optional (default='v2')
        Version supported. Allowed values:

        * ``'v1'`` - version 1, maintain for knowledge graph.
        * ``'v2'`` - Trained on bigger dataset, better version.

    model : str, optional (default='xlnet')
        Model architecture supported. Allowed values:

        * ``'bert'`` - Google BERT BASE parameters.
        * ``'tiny-bert'`` - Google BERT TINY parameters.
        * ``'albert'`` - Google ALBERT BASE parameters.
        * ``'tiny-albert'`` - Google ALBERT TINY parameters.
        * ``'xlnet'`` - Google XLNET BASE parameters.
        * ``'alxlnet'`` - Malaya ALXLNET BASE parameters.

    quantized : bool, optional (default=False)
        if True, will load 8-bit quantized model.
        Quantized model not necessary faster, totally depends on the machine.

    Returns
    -------
    result: model
        List of model classes:

        * if `bert` in model, will return `malaya.model.bert.DependencyBERT`.
        * if `xlnet` in model, will return `malaya.model.xlnet.DependencyXLNET`.
    """
[4]:
model = malaya.dependency.transformer(model = 'albert')
INFO:root:running dependency-v2/albert using device /device:CPU:0

Load Quantized model#

To load 8-bit quantized model, simply pass quantized = True, default is False.

We can expect slightly accuracy drop from quantized model, and not necessary faster than normal 32-bit float model, totally depends on machine.

[5]:
quantized_model = malaya.dependency.transformer(model = 'albert', quantized = True)
WARNING:root:Load quantized model will cause accuracy drop.
INFO:root:running dependency-v2/albert-quantized using device /device:CPU:0

Predict#

def predict(self, string: str):
    """
    Tag a string.

    Parameters
    ----------
    string: str

    Returns
    -------
    result: Tuple
    """
[6]:
string = 'Dr Mahathir menasihati mereka supaya berhenti berehat dan tidur sebentar sekiranya mengantuk ketika memandu.'
[7]:
d_object, tagging, indexing = model.predict(string)
d_object.to_graphvis()
[7]:
_images/load-dependency_17_0.svg
[8]:
d_object, tagging, indexing = quantized_model.predict(string)
d_object.to_graphvis()
[8]:
_images/load-dependency_18_0.svg

Voting stack model#

[10]:
alxlnet = malaya.dependency.transformer(model = 'alxlnet')
tagging, indexing = malaya.stack.voting_stack([model, model, alxlnet], string)
malaya.dependency.dependency_graph(tagging, indexing).to_graphvis()
INFO:root:running dependency-v2/alxlnet using device /device:CPU:0
[10]:
_images/load-dependency_20_1.svg

Harder example#

[13]:
# https://www.astroawani.com/berita-malaysia/terbaik-tun-kita-geng-najib-razak-puji-tun-m-297884

s = """
KUALA LUMPUR: Dalam hal politik, jarang sekali untuk melihat dua figura ini - bekas Perdana Menteri, Datuk Seri Najib Razak dan Tun Dr Mahathir Mohamad mempunyai 'pandangan yang sama' atau sekapal. Namun, situasi itu berbeza apabila melibatkan isu ketidakpatuhan terhadap prosedur operasi standard (SOP). Najib, yang juga Ahli Parlimen Pekan memuji sikap Ahli Parlimen Langkawi itu yang mengaku bersalah selepas melanggar SOP kerana tidak mengambil suhu badan ketika masuk ke sebuah surau di Langkawi pada Sabtu lalu.
"""
[14]:
d_object, tagging, indexing = model.predict(s)
d_object.to_graphvis()
[14]:
_images/load-dependency_23_0.svg
[15]:
tagging, indexing = malaya.stack.voting_stack([model, model, alxlnet], s)
malaya.dependency.dependency_graph(tagging, indexing).to_graphvis()
[15]:
_images/load-dependency_24_0.svg

Dependency graph object#

To initiate a dependency graph from dependency models, you need to call malaya.dependency.dependency_graph.

[16]:
graph = malaya.dependency.dependency_graph(tagging, indexing)
graph
[16]:
<malaya.function.parse_dependency.DependencyGraph at 0x16ab39c10>

generate graphvis#

[17]:
graph.to_graphvis()
[17]:
_images/load-dependency_28_0.svg

Get nodes#

[17]:
graph.nodes
[17]:
defaultdict(<function malaya.function.parse_dependency.DependencyGraph.__init__.<locals>.<lambda>()>,
            {0: {'address': 0,
              'word': None,
              'lemma': None,
              'ctag': 'TOP',
              'tag': 'TOP',
              'feats': None,
              'head': None,
              'deps': defaultdict(list, {'root': [11]}),
              'rel': None},
             1: {'address': 1,
              'word': 'KUALA',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 11,
              'deps': defaultdict(list,
                          {'flat': [2], 'obl': [5], 'punct': [7]}),
              'rel': 'nsubj'},
             11: {'address': 11,
              'word': 'melihat',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 0,
              'deps': defaultdict(list,
                          {'nsubj': [1],
                           'advmod': [8, 9],
                           'case': [10],
                           'advcl': [29],
                           'dep': [42]}),
              'rel': 'root'},
             2: {'address': 2,
              'word': 'LUMPUR',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 1,
              'deps': defaultdict(list, {}),
              'rel': 'flat'},
             3: {'address': 3,
              'word': ':',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 5,
              'deps': defaultdict(list, {}),
              'rel': 'punct'},
             5: {'address': 5,
              'word': 'hal',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 1,
              'deps': defaultdict(list,
                          {'punct': [3], 'case': [4], 'compound': [6]}),
              'rel': 'obl'},
             4: {'address': 4,
              'word': 'Dalam',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 5,
              'deps': defaultdict(list, {}),
              'rel': 'case'},
             6: {'address': 6,
              'word': 'politik',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 5,
              'deps': defaultdict(list, {}),
              'rel': 'compound'},
             7: {'address': 7,
              'word': ',',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 1,
              'deps': defaultdict(list, {}),
              'rel': 'punct'},
             8: {'address': 8,
              'word': 'jarang',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 11,
              'deps': defaultdict(list, {}),
              'rel': 'advmod'},
             9: {'address': 9,
              'word': 'sekali',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 11,
              'deps': defaultdict(list, {}),
              'rel': 'advmod'},
             10: {'address': 10,
              'word': 'untuk',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 11,
              'deps': defaultdict(list, {}),
              'rel': 'case'},
             12: {'address': 12,
              'word': 'dua',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 13,
              'deps': defaultdict(list, {}),
              'rel': 'nummod'},
             13: {'address': 13,
              'word': 'figura',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 29,
              'deps': defaultdict(list,
                          {'nummod': [12],
                           'punct': [15],
                           'compound:plur': [16],
                           'flat': [17]}),
              'rel': 'obj'},
             29: {'address': 29,
              'word': 'mempunyai',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 11,
              'deps': defaultdict(list,
                          {'obj': [13, 31], 'punct': [37], 'mark': [38]}),
              'rel': 'advcl'},
             14: {'address': 14,
              'word': 'ini',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 17,
              'deps': defaultdict(list, {}),
              'rel': 'det'},
             17: {'address': 17,
              'word': 'Perdana',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 13,
              'deps': defaultdict(list,
                          {'det': [14],
                           'flat': [18],
                           'punct': [19],
                           'appos': [20],
                           'conj': [25]}),
              'rel': 'flat'},
             15: {'address': 15,
              'word': '-',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 13,
              'deps': defaultdict(list, {}),
              'rel': 'punct'},
             16: {'address': 16,
              'word': 'bekas',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 13,
              'deps': defaultdict(list, {}),
              'rel': 'compound:plur'},
             18: {'address': 18,
              'word': 'Menteri',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 17,
              'deps': defaultdict(list, {}),
              'rel': 'flat'},
             19: {'address': 19,
              'word': ',',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 17,
              'deps': defaultdict(list, {}),
              'rel': 'punct'},
             20: {'address': 20,
              'word': 'Datuk',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 17,
              'deps': defaultdict(list, {'flat': [21]}),
              'rel': 'appos'},
             21: {'address': 21,
              'word': 'Seri',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 20,
              'deps': defaultdict(list, {'flat': [22]}),
              'rel': 'flat'},
             22: {'address': 22,
              'word': 'Najib',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 21,
              'deps': defaultdict(list, {'flat': [23]}),
              'rel': 'flat'},
             23: {'address': 23,
              'word': 'Razak',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 22,
              'deps': defaultdict(list, {}),
              'rel': 'flat'},
             24: {'address': 24,
              'word': 'dan',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 25,
              'deps': defaultdict(list, {}),
              'rel': 'cc'},
             25: {'address': 25,
              'word': 'Tun',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 17,
              'deps': defaultdict(list, {'cc': [24], 'flat': [26]}),
              'rel': 'conj'},
             26: {'address': 26,
              'word': 'Dr',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 25,
              'deps': defaultdict(list, {'flat': [27]}),
              'rel': 'flat'},
             27: {'address': 27,
              'word': 'Mahathir',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 26,
              'deps': defaultdict(list, {'flat': [28]}),
              'rel': 'flat'},
             28: {'address': 28,
              'word': 'Mohamad',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 27,
              'deps': defaultdict(list, {}),
              'rel': 'flat'},
             30: {'address': 30,
              'word': "'",
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 31,
              'deps': defaultdict(list, {}),
              'rel': 'punct'},
             31: {'address': 31,
              'word': 'pandangan',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 29,
              'deps': defaultdict(list, {'punct': [30], 'amod': [33]}),
              'rel': 'obj'},
             32: {'address': 32,
              'word': 'yang',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 36,
              'deps': defaultdict(list, {}),
              'rel': 'nsubj'},
             36: {'address': 36,
              'word': 'sekapal',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 33,
              'deps': defaultdict(list,
                          {'nsubj': [32], 'punct': [34], 'cc': [35]}),
              'rel': 'conj'},
             33: {'address': 33,
              'word': 'sama',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 31,
              'deps': defaultdict(list, {'conj': [36]}),
              'rel': 'amod'},
             34: {'address': 34,
              'word': "'",
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 36,
              'deps': defaultdict(list, {}),
              'rel': 'punct'},
             35: {'address': 35,
              'word': 'atau',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 36,
              'deps': defaultdict(list, {}),
              'rel': 'cc'},
             37: {'address': 37,
              'word': '.',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 29,
              'deps': defaultdict(list, {}),
              'rel': 'punct'},
             38: {'address': 38,
              'word': 'Namun',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 29,
              'deps': defaultdict(list, {}),
              'rel': 'mark'},
             39: {'address': 39,
              'word': ',',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 42,
              'deps': defaultdict(list, {}),
              'rel': 'punct'},
             42: {'address': 42,
              'word': 'berbeza',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 11,
              'deps': defaultdict(list,
                          {'punct': [39, 54, 89],
                           'nsubj': [40],
                           'advcl': [44],
                           'dep': [55]}),
              'rel': 'dep'},
             40: {'address': 40,
              'word': 'situasi',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 42,
              'deps': defaultdict(list, {'det': [41]}),
              'rel': 'nsubj'},
             41: {'address': 41,
              'word': 'itu',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 40,
              'deps': defaultdict(list, {}),
              'rel': 'det'},
             43: {'address': 43,
              'word': 'apabila',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 44,
              'deps': defaultdict(list, {}),
              'rel': 'mark'},
             44: {'address': 44,
              'word': 'melibatkan',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 42,
              'deps': defaultdict(list, {'mark': [43], 'obj': [45]}),
              'rel': 'advcl'},
             45: {'address': 45,
              'word': 'isu',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 44,
              'deps': defaultdict(list, {'compound': [46], 'nmod': [48]}),
              'rel': 'obj'},
             46: {'address': 46,
              'word': 'ketidakpatuhan',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 45,
              'deps': defaultdict(list, {}),
              'rel': 'compound'},
             47: {'address': 47,
              'word': 'terhadap',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 48,
              'deps': defaultdict(list, {}),
              'rel': 'case'},
             48: {'address': 48,
              'word': 'prosedur',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 45,
              'deps': defaultdict(list,
                          {'case': [47],
                           'compound': [49],
                           'amod': [50],
                           'appos': [52]}),
              'rel': 'nmod'},
             49: {'address': 49,
              'word': 'operasi',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 48,
              'deps': defaultdict(list, {}),
              'rel': 'compound'},
             50: {'address': 50,
              'word': 'standard',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 48,
              'deps': defaultdict(list, {}),
              'rel': 'amod'},
             51: {'address': 51,
              'word': '(',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 52,
              'deps': defaultdict(list, {}),
              'rel': 'punct'},
             52: {'address': 52,
              'word': 'SOP',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 48,
              'deps': defaultdict(list, {'punct': [51, 53]}),
              'rel': 'appos'},
             53: {'address': 53,
              'word': ')',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 52,
              'deps': defaultdict(list, {}),
              'rel': 'punct'},
             54: {'address': 54,
              'word': '.',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 42,
              'deps': defaultdict(list, {}),
              'rel': 'punct'},
             55: {'address': 55,
              'word': 'Najib',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 42,
              'deps': defaultdict(list,
                          {'punct': [56], 'nsubj': [59], 'acl': [62]}),
              'rel': 'dep'},
             56: {'address': 56,
              'word': ',',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 55,
              'deps': defaultdict(list, {}),
              'rel': 'punct'},
             57: {'address': 57,
              'word': 'yang',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 59,
              'deps': defaultdict(list, {}),
              'rel': 'nsubj'},
             59: {'address': 59,
              'word': 'Ahli',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 55,
              'deps': defaultdict(list,
                          {'nsubj': [57], 'advmod': [58], 'flat': [60]}),
              'rel': 'nsubj'},
             58: {'address': 58,
              'word': 'juga',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 59,
              'deps': defaultdict(list, {}),
              'rel': 'advmod'},
             60: {'address': 60,
              'word': 'Parlimen',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 59,
              'deps': defaultdict(list, {'flat': [61]}),
              'rel': 'flat'},
             61: {'address': 61,
              'word': 'Pekan',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 60,
              'deps': defaultdict(list, {}),
              'rel': 'flat'},
             62: {'address': 62,
              'word': 'memuji',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 55,
              'deps': defaultdict(list, {'obj': [63]}),
              'rel': 'acl'},
             63: {'address': 63,
              'word': 'sikap',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 62,
              'deps': defaultdict(list, {'flat': [64], 'acl': [69]}),
              'rel': 'obj'},
             64: {'address': 64,
              'word': 'Ahli',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 63,
              'deps': defaultdict(list, {'flat': [65]}),
              'rel': 'flat'},
             65: {'address': 65,
              'word': 'Parlimen',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 64,
              'deps': defaultdict(list, {'flat': [66]}),
              'rel': 'flat'},
             66: {'address': 66,
              'word': 'Langkawi',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 65,
              'deps': defaultdict(list, {'det': [67]}),
              'rel': 'flat'},
             67: {'address': 67,
              'word': 'itu',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 66,
              'deps': defaultdict(list, {}),
              'rel': 'det'},
             68: {'address': 68,
              'word': 'yang',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 69,
              'deps': defaultdict(list, {}),
              'rel': 'nsubj'},
             69: {'address': 69,
              'word': 'mengaku',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 63,
              'deps': defaultdict(list, {'nsubj': [68], 'xcomp': [70]}),
              'rel': 'acl'},
             70: {'address': 70,
              'word': 'bersalah',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 69,
              'deps': defaultdict(list, {'xcomp': [72]}),
              'rel': 'xcomp'},
             71: {'address': 71,
              'word': 'selepas',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 72,
              'deps': defaultdict(list, {}),
              'rel': 'case'},
             72: {'address': 72,
              'word': 'melanggar',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 70,
              'deps': defaultdict(list,
                          {'case': [71], 'obj': [73], 'advcl': [76]}),
              'rel': 'xcomp'},
             73: {'address': 73,
              'word': 'SOP',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 72,
              'deps': defaultdict(list, {}),
              'rel': 'obj'},
             74: {'address': 74,
              'word': 'kerana',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 76,
              'deps': defaultdict(list, {}),
              'rel': 'mark'},
             76: {'address': 76,
              'word': 'mengambil',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 72,
              'deps': defaultdict(list,
                          {'mark': [74],
                           'advmod': [75],
                           'obj': [77],
                           'advcl': [80]}),
              'rel': 'advcl'},
             75: {'address': 75,
              'word': 'tidak',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 76,
              'deps': defaultdict(list, {}),
              'rel': 'advmod'},
             77: {'address': 77,
              'word': 'suhu',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 76,
              'deps': defaultdict(list, {'compound': [78]}),
              'rel': 'obj'},
             78: {'address': 78,
              'word': 'badan',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 77,
              'deps': defaultdict(list, {}),
              'rel': 'compound'},
             79: {'address': 79,
              'word': 'ketika',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 80,
              'deps': defaultdict(list, {}),
              'rel': 'mark'},
             80: {'address': 80,
              'word': 'masuk',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 76,
              'deps': defaultdict(list, {'mark': [79], 'obl': [83, 85, 87]}),
              'rel': 'advcl'},
             81: {'address': 81,
              'word': 'ke',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 83,
              'deps': defaultdict(list, {}),
              'rel': 'case'},
             83: {'address': 83,
              'word': 'surau',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 80,
              'deps': defaultdict(list, {'case': [81], 'det': [82]}),
              'rel': 'obl'},
             82: {'address': 82,
              'word': 'sebuah',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 83,
              'deps': defaultdict(list, {}),
              'rel': 'det'},
             84: {'address': 84,
              'word': 'di',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 85,
              'deps': defaultdict(list, {}),
              'rel': 'case'},
             85: {'address': 85,
              'word': 'Langkawi',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 80,
              'deps': defaultdict(list, {'case': [84]}),
              'rel': 'obl'},
             86: {'address': 86,
              'word': 'pada',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 87,
              'deps': defaultdict(list, {}),
              'rel': 'case'},
             87: {'address': 87,
              'word': 'Sabtu',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 80,
              'deps': defaultdict(list, {'case': [86], 'amod': [88]}),
              'rel': 'obl'},
             88: {'address': 88,
              'word': 'lalu',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 87,
              'deps': defaultdict(list, {}),
              'rel': 'amod'},
             89: {'address': 89,
              'word': '.',
              'lemma': '_',
              'ctag': '_',
              'tag': '_',
              'feats': '_',
              'head': 42,
              'deps': defaultdict(list, {}),
              'rel': 'punct'}})

Flat the graph#

[20]:
list(graph.triples())
[20]:
[(('melihat', '_'), 'nsubj', ('KUALA', '_')),
 (('KUALA', '_'), 'flat', ('LUMPUR', '_')),
 (('KUALA', '_'), 'obl', ('hal', '_')),
 (('hal', '_'), 'punct', (':', '_')),
 (('hal', '_'), 'case', ('Dalam', '_')),
 (('hal', '_'), 'compound', ('politik', '_')),
 (('KUALA', '_'), 'punct', (',', '_')),
 (('melihat', '_'), 'advmod', ('jarang', '_')),
 (('melihat', '_'), 'advmod', ('sekali', '_')),
 (('melihat', '_'), 'case', ('untuk', '_')),
 (('melihat', '_'), 'advcl', ('mempunyai', '_')),
 (('mempunyai', '_'), 'obj', ('figura', '_')),
 (('figura', '_'), 'nummod', ('dua', '_')),
 (('figura', '_'), 'punct', ('-', '_')),
 (('figura', '_'), 'compound:plur', ('bekas', '_')),
 (('figura', '_'), 'flat', ('Perdana', '_')),
 (('Perdana', '_'), 'det', ('ini', '_')),
 (('Perdana', '_'), 'flat', ('Menteri', '_')),
 (('Perdana', '_'), 'punct', (',', '_')),
 (('Perdana', '_'), 'appos', ('Datuk', '_')),
 (('Datuk', '_'), 'flat', ('Seri', '_')),
 (('Seri', '_'), 'flat', ('Najib', '_')),
 (('Najib', '_'), 'flat', ('Razak', '_')),
 (('Perdana', '_'), 'conj', ('Tun', '_')),
 (('Tun', '_'), 'cc', ('dan', '_')),
 (('Tun', '_'), 'flat', ('Dr', '_')),
 (('Dr', '_'), 'flat', ('Mahathir', '_')),
 (('Mahathir', '_'), 'flat', ('Mohamad', '_')),
 (('mempunyai', '_'), 'obj', ('pandangan', '_')),
 (('pandangan', '_'), 'punct', ("'", '_')),
 (('pandangan', '_'), 'amod', ('sama', '_')),
 (('sama', '_'), 'conj', ('sekapal', '_')),
 (('sekapal', '_'), 'nsubj', ('yang', '_')),
 (('sekapal', '_'), 'punct', ("'", '_')),
 (('sekapal', '_'), 'cc', ('atau', '_')),
 (('mempunyai', '_'), 'punct', ('.', '_')),
 (('mempunyai', '_'), 'mark', ('Namun', '_')),
 (('melihat', '_'), 'dep', ('berbeza', '_')),
 (('berbeza', '_'), 'punct', (',', '_')),
 (('berbeza', '_'), 'nsubj', ('situasi', '_')),
 (('situasi', '_'), 'det', ('itu', '_')),
 (('berbeza', '_'), 'advcl', ('melibatkan', '_')),
 (('melibatkan', '_'), 'mark', ('apabila', '_')),
 (('melibatkan', '_'), 'obj', ('isu', '_')),
 (('isu', '_'), 'compound', ('ketidakpatuhan', '_')),
 (('isu', '_'), 'nmod', ('prosedur', '_')),
 (('prosedur', '_'), 'case', ('terhadap', '_')),
 (('prosedur', '_'), 'compound', ('operasi', '_')),
 (('prosedur', '_'), 'amod', ('standard', '_')),
 (('prosedur', '_'), 'appos', ('SOP', '_')),
 (('SOP', '_'), 'punct', ('(', '_')),
 (('SOP', '_'), 'punct', (')', '_')),
 (('berbeza', '_'), 'punct', ('.', '_')),
 (('berbeza', '_'), 'dep', ('Najib', '_')),
 (('Najib', '_'), 'punct', (',', '_')),
 (('Najib', '_'), 'nsubj', ('Ahli', '_')),
 (('Ahli', '_'), 'nsubj', ('yang', '_')),
 (('Ahli', '_'), 'advmod', ('juga', '_')),
 (('Ahli', '_'), 'flat', ('Parlimen', '_')),
 (('Parlimen', '_'), 'flat', ('Pekan', '_')),
 (('Najib', '_'), 'acl', ('memuji', '_')),
 (('memuji', '_'), 'obj', ('sikap', '_')),
 (('sikap', '_'), 'flat', ('Ahli', '_')),
 (('Ahli', '_'), 'flat', ('Parlimen', '_')),
 (('Parlimen', '_'), 'flat', ('Langkawi', '_')),
 (('Langkawi', '_'), 'det', ('itu', '_')),
 (('sikap', '_'), 'acl', ('mengaku', '_')),
 (('mengaku', '_'), 'nsubj', ('yang', '_')),
 (('mengaku', '_'), 'xcomp', ('bersalah', '_')),
 (('bersalah', '_'), 'xcomp', ('melanggar', '_')),
 (('melanggar', '_'), 'case', ('selepas', '_')),
 (('melanggar', '_'), 'obj', ('SOP', '_')),
 (('melanggar', '_'), 'advcl', ('mengambil', '_')),
 (('mengambil', '_'), 'mark', ('kerana', '_')),
 (('mengambil', '_'), 'advmod', ('tidak', '_')),
 (('mengambil', '_'), 'obj', ('suhu', '_')),
 (('suhu', '_'), 'compound', ('badan', '_')),
 (('mengambil', '_'), 'advcl', ('masuk', '_')),
 (('masuk', '_'), 'mark', ('ketika', '_')),
 (('masuk', '_'), 'obl', ('surau', '_')),
 (('surau', '_'), 'case', ('ke', '_')),
 (('surau', '_'), 'det', ('sebuah', '_')),
 (('masuk', '_'), 'obl', ('Langkawi', '_')),
 (('Langkawi', '_'), 'case', ('di', '_')),
 (('masuk', '_'), 'obl', ('Sabtu', '_')),
 (('Sabtu', '_'), 'case', ('pada', '_')),
 (('Sabtu', '_'), 'amod', ('lalu', '_')),
 (('berbeza', '_'), 'punct', ('.', '_'))]

Check the graph contains cycles#

[21]:
graph.contains_cycle()
[21]:
False

Generate networkx#

Make sure you already installed networkx,

pip install networkx
[22]:
digraph = graph.to_networkx()
digraph
[22]:
<networkx.classes.multidigraph.MultiDiGraph at 0x16ab39d50>
[23]:
import networkx as nx
import matplotlib.pyplot as plt
nx.draw_networkx(digraph)
plt.show()
_images/load-dependency_37_0.png
[24]:
digraph.edges()
[24]:
OutMultiEdgeDataView([(1, 11), (2, 1), (3, 5), (4, 5), (5, 1), (6, 5), (7, 1), (8, 11), (9, 11), (10, 11), (12, 13), (13, 29), (14, 17), (15, 13), (16, 13), (17, 13), (18, 17), (19, 17), (20, 17), (21, 20), (22, 21), (23, 22), (24, 25), (25, 17), (26, 25), (27, 26), (28, 27), (29, 11), (30, 31), (31, 29), (32, 36), (33, 31), (34, 36), (35, 36), (36, 33), (37, 29), (38, 29), (39, 42), (40, 42), (41, 40), (42, 11), (43, 44), (44, 42), (45, 44), (46, 45), (47, 48), (48, 45), (49, 48), (50, 48), (51, 52), (52, 48), (53, 52), (54, 42), (55, 42), (56, 55), (57, 59), (58, 59), (59, 55), (60, 59), (61, 60), (62, 55), (63, 62), (64, 63), (65, 64), (66, 65), (67, 66), (68, 69), (69, 63), (70, 69), (71, 72), (72, 70), (73, 72), (74, 76), (75, 76), (76, 72), (77, 76), (78, 77), (79, 80), (80, 76), (81, 83), (82, 83), (83, 80), (84, 85), (85, 80), (86, 87), (87, 80), (88, 87), (89, 42)])
[25]:
digraph.nodes()
[25]:
NodeView((1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89))
[26]:
labels = {i:graph.get_by_address(i)['word'] for i in digraph.nodes()}
labels
[26]:
{1: 'KUALA',
 2: 'LUMPUR',
 3: ':',
 4: 'Dalam',
 5: 'hal',
 6: 'politik',
 7: ',',
 8: 'jarang',
 9: 'sekali',
 10: 'untuk',
 11: 'melihat',
 12: 'dua',
 13: 'figura',
 14: 'ini',
 15: '-',
 16: 'bekas',
 17: 'Perdana',
 18: 'Menteri',
 19: ',',
 20: 'Datuk',
 21: 'Seri',
 22: 'Najib',
 23: 'Razak',
 24: 'dan',
 25: 'Tun',
 26: 'Dr',
 27: 'Mahathir',
 28: 'Mohamad',
 29: 'mempunyai',
 30: "'",
 31: 'pandangan',
 32: 'yang',
 33: 'sama',
 34: "'",
 35: 'atau',
 36: 'sekapal',
 37: '.',
 38: 'Namun',
 39: ',',
 40: 'situasi',
 41: 'itu',
 42: 'berbeza',
 43: 'apabila',
 44: 'melibatkan',
 45: 'isu',
 46: 'ketidakpatuhan',
 47: 'terhadap',
 48: 'prosedur',
 49: 'operasi',
 50: 'standard',
 51: '(',
 52: 'SOP',
 53: ')',
 54: '.',
 55: 'Najib',
 56: ',',
 57: 'yang',
 58: 'juga',
 59: 'Ahli',
 60: 'Parlimen',
 61: 'Pekan',
 62: 'memuji',
 63: 'sikap',
 64: 'Ahli',
 65: 'Parlimen',
 66: 'Langkawi',
 67: 'itu',
 68: 'yang',
 69: 'mengaku',
 70: 'bersalah',
 71: 'selepas',
 72: 'melanggar',
 73: 'SOP',
 74: 'kerana',
 75: 'tidak',
 76: 'mengambil',
 77: 'suhu',
 78: 'badan',
 79: 'ketika',
 80: 'masuk',
 81: 'ke',
 82: 'sebuah',
 83: 'surau',
 84: 'di',
 85: 'Langkawi',
 86: 'pada',
 87: 'Sabtu',
 88: 'lalu',
 89: '.'}
[27]:
plt.figure(figsize=(15,5))
nx.draw_networkx(digraph,labels=labels)
plt.show()
_images/load-dependency_41_0.png

Vectorize#

Let say you want to visualize word level in lower dimension, you can use model.vectorize,

def vectorize(self, string: str):
    """
    vectorize a string.

    Parameters
    ----------
    string: List[str]

    Returns
    -------
    result: np.array
    """
[28]:
r = quantized_model.vectorize(s)
[29]:
x = [i[0] for i in r]
y = [i[1] for i in r]
[30]:
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

tsne = TSNE().fit_transform(y)
tsne.shape
[30]:
(89, 2)
[31]:
plt.figure(figsize = (7, 7))
plt.scatter(tsne[:, 0], tsne[:, 1])
labels = x
for label, x, y in zip(
    labels, tsne[:, 0], tsne[:, 1]
):
    label = (
        '%s, %.3f' % (label[0], label[1])
        if isinstance(label, list)
        else label
    )
    plt.annotate(
        label,
        xy = (x, y),
        xytext = (0, 0),
        textcoords = 'offset points',
    )
_images/load-dependency_46_0.png