Models Accuracy

Dependency parsing

Trained on 80% of dataset, tested on 20% of dataset. Link to download dataset available inside the notebooks. All training sessions stored in session/dependency

Below chart is F1 accuracy for dependency tagging.

from IPython.core.display import Image, display

display(Image('dependency-accuracy.png', width=500))
_images/models-accuracy_1_0.png

bert-bahasa-base

arc accuracy: 0.849296829484373
types accuracy: 0.8413393854963266
root accuracy: 0.9214722222222222

               precision    recall  f1-score   support

          PAD    0.99998   1.00000   0.99999    925808
            X    0.99999   0.99987   0.99993    155449
          acl    0.95880   0.95327   0.95603      6078
        advcl    0.94197   0.92524   0.93353      2421
       advmod    0.96066   0.97033   0.96547      9538
         amod    0.95217   0.93538   0.94370      8279
        appos    0.95024   0.95816   0.95418      4923
          aux    1.00000   1.00000   1.00000         7
         case    0.98474   0.98719   0.98597     21707
           cc    0.98337   0.98033   0.98185      6456
        ccomp    0.95352   0.90681   0.92958       837
     compound    0.94585   0.95337   0.94959     13338
compound:plur    0.96429   0.97878   0.97148      1131
         conj    0.96874   0.96504   0.96689      8639
          cop    0.97760   0.98582   0.98169      1904
        csubj    0.95122   0.84783   0.89655        46
   csubj:pass    0.93750   0.75000   0.83333        20
          dep    0.92559   0.90528   0.91532      1003
          det    0.97356   0.95704   0.96523      8194
        fixed    0.95936   0.92525   0.94200      1097
         flat    0.97521   0.97473   0.97497     20660
         iobj    0.92308   0.68571   0.78689        35
         mark    0.95677   0.95951   0.95814      2791
         nmod    0.95451   0.94421   0.94933      8156
        nsubj    0.96610   0.96329   0.96469     12750
   nsubj:pass    0.93547   0.96120   0.94816      4072
       nummod    0.97423   0.98608   0.98012      7975
          obj    0.96661   0.96129   0.96394     10540
          obl    0.96263   0.96778   0.96520     11420
    parataxis    0.91954   0.89510   0.90716       715
        punct    0.99653   0.99769   0.99711     33381
         root    0.97442   0.97955   0.97698     10073
        xcomp    0.92390   0.94021   0.93198      2492

     accuracy                        0.99491   1301935
    macro avg    0.96116   0.94247   0.95082   1301935
 weighted avg    0.99491   0.99491   0.99490   1301935

xlnet-bahasa-base

arc accuracy: 0.929225039681087
types accuracy: 0.9236558875163555
root accuracy: 0.9467261904761904

               precision    recall  f1-score   support

          PAD    0.99999   1.00000   1.00000    678576
            X    0.99999   0.99998   0.99999    168979
          acl    0.97833   0.96736   0.97281      6066
        advcl    0.96546   0.95408   0.95974      2461
       advmod    0.98121   0.97411   0.97765      9542
         amod    0.95723   0.96440   0.96080      8146
        appos    0.97907   0.98380   0.98143      4754
          aux    1.00000   1.00000   1.00000         8
         case    0.99254   0.98906   0.99080     21390
           cc    0.98717   0.99115   0.98916      6442
        ccomp    0.94539   0.92738   0.93630       840
     compound    0.96723   0.97201   0.96962     13362
compound:plur    0.98157   0.99351   0.98751      1233
         conj    0.98292   0.98832   0.98561      8735
          cop    0.97863   0.99565   0.98707      1840
        csubj    1.00000   0.93182   0.96471        44
   csubj:pass    0.94737   0.90000   0.92308        20
          dep    0.95876   0.96875   0.96373       960
          det    0.98030   0.96707   0.97364      8077
        fixed    0.98174   0.94797   0.96456      1134
         flat    0.98401   0.98920   0.98660     20096
         iobj    1.00000   0.84848   0.91803        33
         mark    0.96202   0.98326   0.97253      2808
         nmod    0.96866   0.97038   0.96952      7867
        nsubj    0.98233   0.97675   0.97953     12689
   nsubj:pass    0.95407   0.97905   0.96640      4010
       nummod    0.98587   0.99263   0.98923      7730
          obj    0.98107   0.97622   0.97864     10512
          obl    0.98344   0.97984   0.98164     11456
    parataxis    0.94509   0.96035   0.95266       681
        punct    0.99964   0.99949   0.99956     33118
         root    0.98685   0.98312   0.98498     10073
        xcomp    0.96063   0.95602   0.95832      2501

     accuracy                        0.99636   1066183
    macro avg    0.97753   0.97004   0.97351   1066183
 weighted avg    0.99637   0.99636   0.99636   1066183

albert-bahasa-base

arc accuracy: 0.7974828611806026
types accuracy: 0.784127773549449
root accuracy: 0.8793373015873015

               precision    recall  f1-score   support

          PAD    0.99994   1.00000   0.99997   1038963
            X    0.99994   0.99980   0.99987    196107
          acl    0.89394   0.91141   0.90259      6039
        advcl    0.86349   0.80405   0.83271      2368
       advmod    0.94060   0.91095   0.92554      9422
         amod    0.86990   0.89181   0.88072      8217
        appos    0.90128   0.90034   0.90081      4766
          aux    0.88889   0.88889   0.88889         9
         case    0.95841   0.97824   0.96822     21274
           cc    0.96729   0.95993   0.96359      6438
        ccomp    0.81659   0.80809   0.81232       865
     compound    0.90899   0.90737   0.90818     13473
compound:plur    0.92714   0.95513   0.94093      1159
         conj    0.94037   0.92424   0.93223      8566
          cop    0.94305   0.95779   0.95036      1919
        csubj    0.86667   0.68421   0.76471        38
   csubj:pass    0.71429   0.71429   0.71429        14
          dep    0.85821   0.78004   0.81726      1032
          det    0.93597   0.90688   0.92120      8108
        fixed    0.91759   0.81228   0.86173      1124
         flat    0.95904   0.94476   0.95184     20744
         iobj    1.00000   0.53191   0.69444        47
         mark    0.91366   0.89960   0.90657      2729
         nmod    0.89832   0.88939   0.89383      8046
        nsubj    0.91077   0.93174   0.92114     12730
   nsubj:pass    0.89668   0.88184   0.88920      3986
       nummod    0.95178   0.95529   0.95353      7851
          obj    0.91365   0.92349   0.91854     10495
          obl    0.91312   0.93081   0.92188     11201
    parataxis    0.74352   0.75589   0.74966       721
        punct    0.99166   0.99655   0.99410     33040
         root    0.93833   0.94262   0.94047     10073
        xcomp    0.86927   0.85462   0.86189      2552

     accuracy                        0.99023   1454116
    macro avg    0.90946   0.88286   0.89343   1454116
 weighted avg    0.99022   0.99023   0.99021   1454116

Emotion Analysis

Trained on 80% of dataset, tested on 20% of dataset. All training sessions stored in session/emotion

Graph based on F1-score.

from IPython.core.display import Image, display

display(Image('emotion-accuracy.png', width=500))
_images/models-accuracy_6_0.png

multinomial

              precision    recall  f1-score   support

       anger    0.87563   0.88483   0.88021     14092
        fear    0.75967   0.86772   0.81011      7628
         joy    0.83213   0.87847   0.85467     13610
        love    0.87938   0.87004   0.87469     14882
     sadness    0.72419   0.65285   0.68667     19208
    surprise    0.50147   0.50461   0.50303      9445

    accuracy                        0.77725     78865
   macro avg    0.76208   0.77642   0.76823     78865
weighted avg    0.77592   0.77725   0.77567     78865

bert-bahasa-base

              precision    recall  f1-score   support

       anger    0.92970   0.92983   0.92976     14094
        fear    0.91783   0.84544   0.88015      7518
       happy    0.91390   0.94365   0.92854     13914
        love    0.94439   0.94087   0.94263     14765
     sadness    0.92728   0.68775   0.78975     18985
    surprise    0.62770   0.96131   0.75948      9590

    accuracy                        0.87185     78866
   macro avg    0.87680   0.88481   0.87172     78866
weighted avg    0.89123   0.87185   0.87282     78866

bert-bahasa-small

              precision    recall  f1-score   support

       anger    0.92893   0.92926   0.92909     14065
        fear    0.82324   0.93199   0.87425      7616
       happy    0.92466   0.91592   0.92027     13641
        love    0.93434   0.94386   0.93907     14926
     sadness    0.77547   0.88596   0.82704     19187
    surprise    0.81459   0.48913   0.61124      9431

    accuracy                        0.86681     78866
   macro avg    0.86687   0.84935   0.85016     78866
weighted avg    0.86800   0.86681   0.86132     78866

xlnet-bahasa-base

              precision    recall  f1-score   support

       anger    0.91827   0.94797   0.93288     14164
        fear    0.86772   0.88987   0.87865      7482
       happy    0.91894   0.93049   0.92468     13768
        love    0.92884   0.94967   0.93914     14940
     sadness    0.96163   0.66883   0.78893     18996
    surprise    0.63289   0.94063   0.75667      9516

    accuracy                        0.87161     78866
   macro avg    0.87138   0.88791   0.87016     78866
weighted avg    0.89160   0.87161   0.87156     78866

albert-bahasa-base

              precision    recall  f1-score   support

       anger    0.90896   0.93591   0.92224     14370
        fear    0.86102   0.86502   0.86301      7527
       happy    0.92940   0.90445   0.91675     13710
        love    0.94295   0.92313   0.93294     14701
     sadness    0.85928   0.72104   0.78412     19114
    surprise    0.63000   0.84953   0.72348      9444

    accuracy                        0.85887     78866
   macro avg    0.85527   0.86651   0.85709     78866
weighted avg    0.86883   0.85887   0.86035     78866

Entities Recognition

Trained on 80% of dataset, tested on 20% of dataset. Link to download dataset available inside the notebooks. All training sessions stored in session/entities

Graph based on F1-score.

from IPython.core.display import Image, display

display(Image('ner-accuracy.png', width=500))
_images/models-accuracy_13_0.png

bert-bahasa-base

              precision    recall  f1-score   support

       OTHER    0.95875   0.99758   0.97778   5160854
         PAD    0.99819   1.00000   0.99910    817609
           X    0.99980   0.99981   0.99980   2744716
       event    0.00000   0.00000   0.00000    143787
         law    0.99814   0.87596   0.93307    146950
    location    0.84847   0.96940   0.90491    428869
organization    0.99131   0.74086   0.84798    694150
      person    0.85493   0.96896   0.90838    507960
    quantity    0.99338   0.97925   0.98626     88200
        time    0.98514   0.97960   0.98236    179880

    accuracy                        0.96433  10912975
   macro avg    0.86281   0.85114   0.85396  10912975
weighted avg    0.95354   0.96433   0.95722  10912975

bert-bahasa-small

              precision    recall  f1-score   support

       OTHER    0.96120   0.99734   0.97893   5160854
         PAD    0.99819   1.00000   0.99910    817609
           X    0.99989   0.99981   0.99985   2744716
       event    1.00000   0.00285   0.00569    143787
         law    0.99630   0.91865   0.95590    146950
    location    0.88747   0.96854   0.92623    428869
organization    0.99103   0.79324   0.88118    694150
      person    0.86779   0.97160   0.91677    507960
    quantity    0.98761   0.99141   0.98950     88200
        time    0.99219   0.97997   0.98604    179880

    accuracy                        0.96835  10912975
   macro avg    0.96817   0.86234   0.86392  10912975
weighted avg    0.97006   0.96835   0.96159  10912975

xlnet-bahasa-base

              precision    recall  f1-score   support

       OTHER    0.97309   0.99732   0.98506   5160854
         PAD    0.99957   1.00000   0.99978   1394994
           X    1.00000   0.99992   0.99996   3003425
       event    1.00000   0.05114   0.09730    143787
         law    0.99859   0.95089   0.97416    146950
    location    0.91452   0.99333   0.95230    428869
organization    0.99014   0.91186   0.94939    694150
      person    0.92191   0.98265   0.95131    507960
    quantity    0.98374   0.99266   0.98818     88200
        time    0.99380   0.98426   0.98901    179880

    accuracy                        0.98008  11749069
   macro avg    0.97754   0.88640   0.88865  11749069
weighted avg    0.98082   0.98008   0.97494  11749069

albert-bahasa-base

              precision    recall  f1-score   support

       OTHER    0.93555   0.99377   0.96378   5160854
         PAD    1.00000   1.00000   1.00000   1000356
           X    0.99997   1.00000   0.99998   4397539
       event    0.99247   0.02751   0.05354    143787
         law    0.99062   0.72384   0.83648    146950
    location    0.74938   0.96113   0.84215    428869
organization    0.98696   0.54544   0.70259    694150
      person    0.83895   0.93301   0.88348    507960
    quantity    0.98635   0.96909   0.97764     88200
        time    0.96563   0.92264   0.94364    179880

    accuracy                        0.95329  12748545
   macro avg    0.94459   0.80764   0.82033  12748545
weighted avg    0.95757   0.95329   0.94568  12748545

Language Detection

Trained on 80% of dataset, tested on 20% of dataset. All training sessions stored in session/language-detection

Graph based on F1-score.

display(Image('language-detection-accuracy.png', width=500))
_images/models-accuracy_19_0.png

XGB

              precision    recall  f1-score   support

       OTHER       0.98      0.99      0.99      9424
         eng       1.00      0.99      0.99      9972
         ind       1.00      0.99      0.99     11511
         zlm       1.00      1.00      1.00     10679

   micro avg       0.99      0.99      0.99     41586
   macro avg       0.99      0.99      0.99     41586
weighted avg       0.99      0.99      0.99     41586

Multinomial

              precision    recall  f1-score   support

       OTHER       1.00      0.97      0.99      9424
         eng       0.99      1.00      0.99      9972
         ind       1.00      1.00      1.00     11511
         zlm       0.99      1.00      0.99     10679

   micro avg       0.99      0.99      0.99     41586
   macro avg       0.99      0.99      0.99     41586
weighted avg       0.99      0.99      0.99     41586

SGD

              precision    recall  f1-score   support

       OTHER       0.97      0.99      0.98      9424
         eng       0.99      0.99      0.99      9972
         ind       1.00      0.99      0.99     11511
         zlm       1.00      1.00      1.00     10679

   micro avg       0.99      0.99      0.99     41586
   macro avg       0.99      0.99      0.99     41586
weighted avg       0.99      0.99      0.99     41586

Deep learning

              precision    recall  f1-score   support

       other       1.00      0.99      0.99      9445
     english       1.00      1.00      1.00      9987
  indonesian       1.00      1.00      1.00     11518
       malay       1.00      1.00      1.00     10636

   micro avg       1.00      1.00      1.00     41586
   macro avg       1.00      1.00      1.00     41586
weighted avg       1.00      1.00      1.00     41586

POS Recognition

Trained on 80% of dataset, tested on 20% of dataset. Link to download dataset available inside the notebooks. All training sessions stored in session/pos

Graph based on F1-score.

display(Image('pos-accuracy.png', width=500))
_images/models-accuracy_25_0.png

bert-bahasa-base

              precision    recall  f1-score   support

         ADJ    0.86210   0.71916   0.78417     45666
         ADP    0.96119   0.95565   0.95841    119589
         ADV    0.86670   0.80498   0.83470     47760
         AUX    0.99048   0.99830   0.99437     10000
       CCONJ    0.96073   0.92806   0.94411     37171
         DET    0.94468   0.91233   0.92822     38839
        NOUN    0.89341   0.90842   0.90085    268329
         NUM    0.93258   0.91267   0.92252     41211
         PAD    0.98801   1.00000   0.99397    150331
        PART    0.83045   0.94309   0.88319      5500
        PRON    0.96061   0.94223   0.95133     48835
       PROPN    0.91972   0.92962   0.92464    227608
       PUNCT    0.99724   0.99863   0.99793    182824
       SCONJ    0.66382   0.87314   0.75423     15150
         SYM    0.98408   0.92722   0.95481      3600
        VERB    0.93339   0.95044   0.94184    124518
           X    0.99984   0.99857   0.99920    501714

    accuracy                        0.95174   1868645
   macro avg    0.92288   0.92368   0.92168   1868645
weighted avg    0.95218   0.95174   0.95161   1868645

bert-bahasa-small

              precision    recall  f1-score   support

         ADJ    0.78068   0.77441   0.77753     45666
         ADP    0.96979   0.94450   0.95698    119589
         ADV    0.84482   0.80980   0.82694     47760
         AUX    0.99442   0.99830   0.99636     10000
       CCONJ    0.95610   0.93046   0.94310     37171
         DET    0.91002   0.94263   0.92604     38839
        NOUN    0.89615   0.89397   0.89506    268329
         NUM    0.93547   0.90692   0.92097     41211
         PAD    0.98801   1.00000   0.99397    150331
        PART    0.88135   0.93327   0.90657      5500
        PRON    0.96430   0.93761   0.95077     48835
       PROPN    0.90880   0.94060   0.92443    227608
       PUNCT    0.99784   0.99834   0.99809    182824
       SCONJ    0.68205   0.87617   0.76702     15150
         SYM    0.96822   0.91389   0.94027      3600
        VERB    0.96111   0.91939   0.93979    124518
           X    0.99979   0.99856   0.99918    501714

    accuracy                        0.95006   1868645
   macro avg    0.91994   0.92464   0.92136   1868645
weighted avg    0.95078   0.95006   0.95021   1868645

xlnet-bahasa-base

              precision    recall  f1-score   support

         ADJ    0.85134   0.76284   0.80467     45666
         ADP    0.96919   0.95234   0.96069    119589
         ADV    0.84419   0.83520   0.83967     47760
         AUX    0.99502   0.99930   0.99716     10000
       CCONJ    0.95966   0.92860   0.94387     37171
         DET    0.94171   0.93254   0.93710     38839
        NOUN    0.90569   0.90462   0.90516    268329
         NUM    0.94990   0.91369   0.93144     41211
         PAD    0.99741   1.00000   0.99871    154308
        PART    0.90704   0.93491   0.92076      5500
        PRON    0.97384   0.93777   0.95547     48835
       PROPN    0.90716   0.95069   0.92841    227608
       PUNCT    0.99810   0.99918   0.99864    182824
       SCONJ    0.66913   0.87393   0.75794     15150
         SYM    0.99347   0.92944   0.96039      3600
        VERB    0.95918   0.93351   0.94617    124518
           X    0.99990   0.99955   0.99972    536393

    accuracy                        0.95581   1907301
   macro avg    0.93070   0.92871   0.92859   1907301
weighted avg    0.95652   0.95581   0.95589   1907301

albert-base-bahasa

              precision    recall  f1-score   support

         ADJ    0.81972   0.73361   0.77428     45666
         ADP    0.97440   0.94106   0.95744    119589
         ADV    0.84503   0.80928   0.82677     47760
         AUX    0.99502   0.99830   0.99666     10000
       CCONJ    0.96896   0.92475   0.94634     37171
         DET    0.92684   0.94261   0.93466     38839
        NOUN    0.89857   0.88888   0.89370    268329
         NUM    0.94593   0.89027   0.91726     41211
         PAD    0.98892   1.00000   0.99443    162922
        PART    0.83716   0.92909   0.88073      5500
        PRON    0.96200   0.94148   0.95163     48835
       PROPN    0.89059   0.95483   0.92159    227608
       PUNCT    0.99693   0.99889   0.99791    182824
       SCONJ    0.65652   0.91670   0.76509     15150
         SYM    0.98240   0.88361   0.93039      3600
        VERB    0.95949   0.91441   0.93641    124518
           X    0.99984   0.99867   0.99925    624816

    accuracy                        0.95280   2004338
   macro avg    0.92049   0.92156   0.91909   2004338
weighted avg    0.95379   0.95280   0.95284   2004338

Relevancy

Trained on 80% of dataset, tested on 20% of dataset. All training sessions stored in session/relevancy

Graph based on F1-score.

display(Image('relevancy-accuracy.png', width=500))
_images/models-accuracy_31_0.png

bert-bahasa-base

              precision    recall  f1-score   support

not relevant    0.86398   0.83633   0.84993      3000
    relevant    0.91074   0.92692   0.91876      5405

    accuracy                        0.89459      8405
   macro avg    0.88736   0.88163   0.88435      8405
weighted avg    0.89405   0.89459   0.89419      8405

xlnet-bahasa-base

              precision    recall  f1-score   support

not relevant    0.89978   0.81400   0.85474      3000
    relevant    0.90195   0.94968   0.92520      5405

    accuracy                        0.90125      8405
   macro avg    0.90086   0.88184   0.88997      8405
weighted avg    0.90118   0.90125   0.90005      8405

albert-bahasa-base

              precision    recall  f1-score   support

not relevant    0.88735   0.81400   0.84910      3000
    relevant    0.90129   0.94265   0.92150      5405

    accuracy                        0.89673      8405
   macro avg    0.89432   0.87832   0.88530      8405
weighted avg    0.89632   0.89673   0.89566      8405

Sentiment Analysis

Trained on 80% of dataset, tested on 20% of dataset. All training sessions stored in session/sentiment

Graph based on F1-score.

display(Image('sentiment-accuracy.png', width=500))
_images/models-accuracy_36_0.png

multinomial

              precision    recall  f1-score   support

    negative    0.80689   0.82413   0.81542     80911
    positive    0.80372   0.78500   0.79425     74228

    accuracy                        0.80541    155139
   macro avg    0.80530   0.80456   0.80483    155139
weighted avg    0.80537   0.80541   0.80529    155139

bert-bahasa-base

              precision    recall  f1-score   support

    negative    0.82923   0.87643   0.85218     80965
    positive    0.85618   0.80299   0.82873     74174

    accuracy                        0.84132    155139
   macro avg    0.84271   0.83971   0.84046    155139
weighted avg    0.84212   0.84132   0.84097    155139

bert-bahasa-small

              precision    recall  f1-score   support

    negative    0.86186   0.82708   0.84411     80632
    positive    0.82069   0.85654   0.83823     74507

    accuracy                        0.84123    155139
   macro avg    0.84128   0.84181   0.84117    155139
weighted avg    0.84209   0.84123   0.84129    155139

xlnet-bahasa-base

              precision    recall  f1-score   support

    negative    0.80365   0.91349   0.85506     80959
    positive    0.88903   0.75642   0.81738     74180

    accuracy                        0.83838    155139
   macro avg    0.84634   0.83495   0.83622    155139
weighted avg    0.84447   0.83838   0.83704    155139

albert-bahasa-base

              precision    recall  f1-score   support

    negative    0.84067   0.80939   0.82473     81213
    positive    0.79883   0.83148   0.81483     73926

    accuracy                        0.81992    155139
   macro avg    0.81975   0.82044   0.81978    155139
weighted avg    0.82073   0.81992   0.82001    155139

Similarity

Trained on 80% of dataset, tested on 20% of dataset. All training sessions stored in session/similarity

Graph based on F1-score.

display(Image('similarity-accuracy.png', width=500))
_images/models-accuracy_43_0.png

bert-bahasa-base

              precision    recall  f1-score   support

 not similar    0.89808   0.87787   0.88786     50881
     similar    0.79975   0.83039   0.81478     29886

    accuracy                        0.86030     80767
   macro avg    0.84892   0.85413   0.85132     80767
weighted avg    0.86170   0.86030   0.86082     80767

xlnet-bahasa-base

              precision    recall  f1-score   support

 not similar    0.80774   0.93228   0.86556     50919
     similar    0.84325   0.62145   0.71556     29848

    accuracy                        0.81741     80767
   macro avg    0.82550   0.77687   0.79056     80767
weighted avg    0.82086   0.81741   0.81012     80767

albert-bahasa-base

              precision    recall  f1-score   support

 not similar    0.88273   0.85781   0.87009     51052
     similar    0.76701   0.80421   0.78517     29715

    accuracy                        0.83809     80767
   macro avg    0.82487   0.83101   0.82763     80767
weighted avg    0.84015   0.83809   0.83885     80767

Subjectivity Analysis

Trained on 80% of dataset, tested on 20% of dataset. All training sessions stored in session/subjectivity

Graph based on F1-score.

multinomial

              precision    recall  f1-score   support

    negative       0.91      0.85      0.88       999
    positive       0.86      0.92      0.89       994

   micro avg       0.89      0.89      0.89      1993
   macro avg       0.89      0.89      0.89      1993
weighted avg       0.89      0.89      0.89      1993

bert-bahasa-base

              precision    recall  f1-score   support

    negative    0.91856   0.90733   0.91291       982
    positive    0.91105   0.92186   0.91642      1011

    accuracy                        0.91470      1993
   macro avg    0.91480   0.91460   0.91467      1993
weighted avg    0.91475   0.91470   0.91469      1993

bert-bahasa-small

              precision    recall  f1-score   support

    negative    0.89731   0.92402   0.91047       974
    positive    0.92525   0.89892   0.91190      1019

    accuracy                        0.91119      1993
   macro avg    0.91128   0.91147   0.91118      1993
weighted avg    0.91160   0.91119   0.91120      1993

xlnet-bahasa-base

              precision    recall  f1-score   support

    negative    0.89741   0.91317   0.90522      1025
    positive    0.90632   0.88946   0.89781       968

    accuracy                        0.90166      1993
   macro avg    0.90186   0.90132   0.90152      1993
weighted avg    0.90174   0.90166   0.90162      1993

albert-bahasa-base

              precision    recall  f1-score   support

    negative    0.89970   0.89432   0.89700      1003
    positive    0.89357   0.89899   0.89627       990

    accuracy                        0.89664      1993
   macro avg    0.89664   0.89665   0.89664      1993
weighted avg    0.89666   0.89664   0.89664      1993

Toxicity Analysis

Trained on 80% of dataset, tested on 20% of dataset. All training sessions stored in session/toxic

Graph based on F1-score.

display(Image('toxic-accuracy.png', width=500))
_images/models-accuracy_55_0.png

multinomial

               precision    recall  f1-score   support

        toxic    0.83711   0.33008   0.47347      3690
 severe_toxic    0.35664   0.13636   0.19729       374
      obscene    0.79276   0.31265   0.44845      2031
       threat    0.16667   0.05172   0.07895       116
       insult    0.70725   0.26941   0.39019      1919
identity_hate    0.28571   0.06077   0.10023       362

    micro avg    0.75516   0.28839   0.41738      8492
    macro avg    0.52436   0.19350   0.28143      8492
 weighted avg    0.74334   0.28839   0.41520      8492
  samples avg    0.02951   0.02374   0.02466      8492

bert-bahasa-base

               precision    recall  f1-score   support

        toxic    0.77604   0.73972   0.75745      3696
 severe_toxic    0.46594   0.44531   0.45539       384
      obscene    0.70845   0.75122   0.72921      2054
       threat    0.52525   0.50000   0.51232       104
       insult    0.72469   0.64050   0.68000      1911
identity_hate    0.56610   0.51385   0.53871       325

    micro avg    0.72273   0.69519   0.70869      8474
    macro avg    0.62775   0.59843   0.61218      8474
 weighted avg    0.72290   0.69519   0.70805      8474
  samples avg    0.06576   0.06529   0.06289      8474

bert-bahasa-small

               precision    recall  f1-score   support

        toxic    0.76917   0.77332   0.77124      3710
 severe_toxic    0.56126   0.36410   0.44168       390
      obscene    0.78999   0.70588   0.74557      2057
       threat    0.61842   0.41593   0.49735       113
       insult    0.71568   0.67955   0.69715      1941
identity_hate    0.66368   0.43402   0.52482       341

    micro avg    0.75060   0.69890   0.72383      8552
    macro avg    0.68637   0.56213   0.61297      8552
 weighted avg    0.74636   0.69890   0.71977      8552
  samples avg    0.06782   0.06616   0.06420      8552

xlnet-bahasa-base

               precision    recall  f1-score   support

        toxic    0.77923   0.76371   0.77139      3665
 severe_toxic    0.37925   0.55497   0.45058       382
      obscene    0.77055   0.76058   0.76553      2009
       threat    0.59036   0.40496   0.48039       121
       insult    0.68254   0.72612   0.70366      1895
identity_hate    0.52620   0.62432   0.57108       370

    micro avg    0.71437   0.73383   0.72397      8442
    macro avg    0.62135   0.63911   0.62377      8442
 weighted avg    0.72356   0.73383   0.72733      8442
  samples avg    0.06329   0.06815   0.06304      8442

albert-bahasa-base

               precision    recall  f1-score   support

        toxic    0.70172   0.75169   0.72585      3693
 severe_toxic    0.46209   0.33420   0.38788       383
      obscene    0.76764   0.74951   0.75847      2032
       threat    0.49296   0.34314   0.40462       102
       insult    0.67535   0.67606   0.67570      1880
identity_hate    0.67879   0.33333   0.44711       336

    micro avg    0.70126   0.69369   0.69745      8426
    macro avg    0.62976   0.53132   0.56660      8426
 weighted avg    0.69740   0.69369   0.69216      8426
  samples avg    0.06495   0.06556   0.06256      8426