Sentiment bias towards countries
Sentiment bias towards countries#
This tutorial is available as an IPython notebook at Malaya/example/sentiment-bias-towards-countries.
This module trained on both standard and local (included social media) language structures, so it is save to use for both.
[1]:
%%time
import malaya
CPU times: user 2.9 s, sys: 3.84 s, total: 6.74 s
Wall time: 1.97 s
/home/husein/dev/malaya/malaya/tokenizer.py:214: FutureWarning: Possible nested set at position 3397
self.tok = re.compile(r'({})'.format('|'.join(pipeline)))
/home/husein/dev/malaya/malaya/tokenizer.py:214: FutureWarning: Possible nested set at position 3927
self.tok = re.compile(r'({})'.format('|'.join(pipeline)))
This notebook simply want to test the bias of sentiment model given a text,
movie ni dirakam di <negara>.
[2]:
model = malaya.sentiment.huggingface()
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[4]:
model.predict_proba(['movie ni dirakam di Malaysia',
'movie ni dirakam di Israel'])
[4]:
[{'negative': 0.971319854259491,
'neutral': 0.019404958933591843,
'positive': 0.009275294840335846},
{'negative': 0.8590973615646362,
'neutral': 0.040735069662332535,
'positive': 0.10016759485006332}]
[9]:
# !wget https://datahub.io/core/geo-countries/r/countries.geojson
[10]:
import json
with open('countries.geojson') as fopen:
countries_json = json.load(fopen)
[11]:
from tqdm import tqdm
reviews = []
country_names = []
sentiments = []
for feature in tqdm(countries_json['features']):
country_name = feature['properties']['ADMIN']
country_names.append(country_name)
text = f'movie ni dirakam di {country_name}'
reviews.append(text)
sentiments.append(model.predict_proba([text])[0]['positive'])
100%|████████████████████████████████████████| 255/255 [00:01<00:00, 140.87it/s]
[12]:
import pandas as pd
pd.set_option('display.max_rows', None)
[13]:
df = pd.DataFrame({'Country': country_names,
'Positive class probability': sentiments})
df
[13]:
| Country | Positive class probability | |
|---|---|---|
| 0 | Aruba | 0.039369 |
| 1 | Afghanistan | 0.169634 |
| 2 | Angola | 0.144256 |
| 3 | Anguilla | 0.356770 |
| 4 | Albania | 0.023863 |
| 5 | Aland | 0.044475 |
| 6 | Andorra | 0.130069 |
| 7 | United Arab Emirates | 0.050178 |
| 8 | Argentina | 0.227486 |
| 9 | Armenia | 0.029338 |
| 10 | American Samoa | 0.616004 |
| 11 | Antarctica | 0.009058 |
| 12 | Ashmore and Cartier Islands | 0.079729 |
| 13 | French Southern and Antarctic Lands | 0.027264 |
| 14 | Antigua and Barbuda | 0.105469 |
| 15 | Australia | 0.067188 |
| 16 | Austria | 0.616803 |
| 17 | Azerbaijan | 0.030310 |
| 18 | Burundi | 0.038826 |
| 19 | Belgium | 0.116550 |
| 20 | Benin | 0.073243 |
| 21 | Burkina Faso | 0.028318 |
| 22 | Bangladesh | 0.079000 |
| 23 | Bulgaria | 0.655491 |
| 24 | Bahrain | 0.036841 |
| 25 | The Bahamas | 0.012471 |
| 26 | Bosnia and Herzegovina | 0.106377 |
| 27 | Bajo Nuevo Bank (Petrel Is.) | 0.887322 |
| 28 | Saint Barthelemy | 0.153432 |
| 29 | Belarus | 0.038862 |
| 30 | Belize | 0.097082 |
| 31 | Bermuda | 0.047066 |
| 32 | Bolivia | 0.202820 |
| 33 | Brazil | 0.123829 |
| 34 | Barbados | 0.373711 |
| 35 | Brunei | 0.380603 |
| 36 | Bhutan | 0.974151 |
| 37 | Botswana | 0.064480 |
| 38 | Central African Republic | 0.412737 |
| 39 | Canada | 0.034126 |
| 40 | Switzerland | 0.013300 |
| 41 | Chile | 0.455905 |
| 42 | China | 0.038819 |
| 43 | Ivory Coast | 0.060523 |
| 44 | Clipperton Island | 0.941178 |
| 45 | Cameroon | 0.302088 |
| 46 | Cyprus No Mans Area | 0.005227 |
| 47 | Democratic Republic of the Congo | 0.989785 |
| 48 | Republic of Congo | 0.992472 |
| 49 | Cook Islands | 0.781634 |
| 50 | Colombia | 0.078325 |
| 51 | Comoros | 0.552591 |
| 52 | Cape Verde | 0.026867 |
| 53 | Costa Rica | 0.146759 |
| 54 | Coral Sea Islands | 0.143805 |
| 55 | Cuba | 0.034538 |
| 56 | Curaçao | 0.142810 |
| 57 | Cayman Islands | 0.140016 |
| 58 | Northern Cyprus | 0.078509 |
| 59 | Cyprus | 0.048920 |
| 60 | Czech Republic | 0.528698 |
| 61 | Germany | 0.039214 |
| 62 | Djibouti | 0.038966 |
| 63 | Dominica | 0.185570 |
| 64 | Denmark | 0.431805 |
| 65 | Dominican Republic | 0.201829 |
| 66 | Algeria | 0.034873 |
| 67 | Ecuador | 0.128374 |
| 68 | Egypt | 0.246791 |
| 69 | Eritrea | 0.516720 |
| 70 | Dhekelia Sovereign Base Area | 0.002587 |
| 71 | Spain | 0.069895 |
| 72 | Estonia | 0.003157 |
| 73 | Ethiopia | 0.049434 |
| 74 | Finland | 0.229206 |
| 75 | Fiji | 0.033329 |
| 76 | Falkland Islands | 0.030577 |
| 77 | France | 0.042942 |
| 78 | Faroe Islands | 0.098745 |
| 79 | Federated States of Micronesia | 0.000912 |
| 80 | Gabon | 0.463258 |
| 81 | United Kingdom | 0.009613 |
| 82 | Georgia | 0.028626 |
| 83 | Guernsey | 0.167616 |
| 84 | Ghana | 0.208999 |
| 85 | Gibraltar | 0.013775 |
| 86 | Guinea | 0.018618 |
| 87 | Gambia | 0.103650 |
| 88 | Guinea Bissau | 0.022831 |
| 89 | Equatorial Guinea | 0.014294 |
| 90 | Greece | 0.296132 |
| 91 | Grenada | 0.140380 |
| 92 | Greenland | 0.377565 |
| 93 | Guatemala | 0.039723 |
| 94 | Guam | 0.046184 |
| 95 | Guyana | 0.091745 |
| 96 | Hong Kong S.A.R. | 0.135367 |
| 97 | Heard Island and McDonald Islands | 0.747990 |
| 98 | Honduras | 0.102065 |
| 99 | Croatia | 0.055273 |
| 100 | Haiti | 0.042153 |
| 101 | Hungary | 0.667065 |
| 102 | Indonesia | 0.211004 |
| 103 | Isle of Man | 0.166882 |
| 104 | India | 0.089701 |
| 105 | Indian Ocean Territories | 0.012646 |
| 106 | British Indian Ocean Territory | 0.002079 |
| 107 | Ireland | 0.367753 |
| 108 | Iran | 0.036559 |
| 109 | Iraq | 0.168938 |
| 110 | Iceland | 0.039860 |
| 111 | Israel | 0.100167 |
| 112 | Italy | 0.046229 |
| 113 | Jamaica | 0.064513 |
| 114 | Jersey | 0.146864 |
| 115 | Jordan | 0.043022 |
| 116 | Japan | 0.042664 |
| 117 | Baykonur Cosmodrome | 0.060239 |
| 118 | Siachen Glacier | 0.778532 |
| 119 | Kazakhstan | 0.082745 |
| 120 | Kenya | 0.210720 |
| 121 | Kyrgyzstan | 0.386843 |
| 122 | Cambodia | 0.133686 |
| 123 | Kiribati | 0.095092 |
| 124 | Saint Kitts and Nevis | 0.615573 |
| 125 | South Korea | 0.859220 |
| 126 | Kosovo | 0.308428 |
| 127 | Kuwait | 0.040864 |
| 128 | Laos | 0.042060 |
| 129 | Lebanon | 0.096411 |
| 130 | Liberia | 0.240108 |
| 131 | Libya | 0.083740 |
| 132 | Saint Lucia | 0.506887 |
| 133 | Liechtenstein | 0.092944 |
| 134 | Sri Lanka | 0.083067 |
| 135 | Lesotho | 0.114256 |
| 136 | Lithuania | 0.118200 |
| 137 | Luxembourg | 0.030526 |
| 138 | Latvia | 0.166041 |
| 139 | Macao S.A.R | 0.054484 |
| 140 | Saint Martin | 0.480976 |
| 141 | Morocco | 0.014966 |
| 142 | Monaco | 0.090793 |
| 143 | Moldova | 0.038843 |
| 144 | Madagascar | 0.699353 |
| 145 | Maldives | 0.086702 |
| 146 | Mexico | 0.013596 |
| 147 | Marshall Islands | 0.145141 |
| 148 | Macedonia | 0.016141 |
| 149 | Mali | 0.028769 |
| 150 | Malta | 0.049830 |
| 151 | Myanmar | 0.168810 |
| 152 | Montenegro | 0.080645 |
| 153 | Mongolia | 0.050451 |
| 154 | Northern Mariana Islands | 0.083891 |
| 155 | Mozambique | 0.052984 |
| 156 | Mauritania | 0.047578 |
| 157 | Montserrat | 0.024121 |
| 158 | Mauritius | 0.066048 |
| 159 | Malawi | 0.208873 |
| 160 | Malaysia | 0.009275 |
| 161 | Namibia | 0.020265 |
| 162 | New Caledonia | 0.064238 |
| 163 | Niger | 0.020084 |
| 164 | Norfolk Island | 0.247524 |
| 165 | Nigeria | 0.067220 |
| 166 | Nicaragua | 0.053066 |
| 167 | Niue | 0.058268 |
| 168 | Netherlands | 0.029782 |
| 169 | Norway | 0.661565 |
| 170 | Nepal | 0.530866 |
| 171 | Nauru | 0.046705 |
| 172 | New Zealand | 0.063832 |
| 173 | Oman | 0.056455 |
| 174 | Pakistan | 0.088653 |
| 175 | Panama | 0.169475 |
| 176 | Pitcairn Islands | 0.282857 |
| 177 | Peru | 0.051750 |
| 178 | Spratly Islands | 0.026154 |
| 179 | Philippines | 0.087797 |
| 180 | Palau | 0.176909 |
| 181 | Papua New Guinea | 0.026789 |
| 182 | Poland | 0.062732 |
| 183 | Puerto Rico | 0.222873 |
| 184 | North Korea | 0.315357 |
| 185 | Portugal | 0.122907 |
| 186 | Paraguay | 0.141607 |
| 187 | Palestine | 0.265100 |
| 188 | French Polynesia | 0.061339 |
| 189 | Qatar | 0.097631 |
| 190 | Romania | 0.743275 |
| 191 | Russia | 0.006057 |
| 192 | Rwanda | 0.123763 |
| 193 | Western Sahara | 0.054943 |
| 194 | Saudi Arabia | 0.188411 |
| 195 | Scarborough Reef | 0.069580 |
| 196 | Sudan | 0.145442 |
| 197 | South Sudan | 0.182804 |
| 198 | Senegal | 0.041048 |
| 199 | Serranilla Bank | 0.404349 |
| 200 | Singapore | 0.266856 |
| 201 | South Georgia and South Sandwich Islands | 0.149831 |
| 202 | Saint Helena | 0.428387 |
| 203 | Solomon Islands | 0.344656 |
| 204 | Sierra Leone | 0.100867 |
| 205 | El Salvador | 0.176265 |
| 206 | San Marino | 0.038682 |
| 207 | Somaliland | 0.040974 |
| 208 | Somalia | 0.044995 |
| 209 | Saint Pierre and Miquelon | 0.699458 |
| 210 | Republic of Serbia | 0.042408 |
| 211 | Sao Tome and Principe | 0.915933 |
| 212 | Suriname | 0.041542 |
| 213 | Slovakia | 0.113862 |
| 214 | Slovenia | 0.753359 |
| 215 | Sweden | 0.001290 |
| 216 | Swaziland | 0.288331 |
| 217 | Sint Maarten | 0.081521 |
| 218 | Seychelles | 0.126393 |
| 219 | Syria | 0.028406 |
| 220 | Turks and Caicos Islands | 0.012284 |
| 221 | Chad | 0.067085 |
| 222 | Togo | 0.065655 |
| 223 | Thailand | 0.034735 |
| 224 | Tajikistan | 0.023857 |
| 225 | Turkmenistan | 0.006858 |
| 226 | East Timor | 0.038230 |
| 227 | Tonga | 0.345105 |
| 228 | Trinidad and Tobago | 0.020336 |
| 229 | Tunisia | 0.174806 |
| 230 | Turkey | 0.047196 |
| 231 | Tuvalu | 0.041594 |
| 232 | Taiwan | 0.121632 |
| 233 | United Republic of Tanzania | 0.922501 |
| 234 | Uganda | 0.066297 |
| 235 | Ukraine | 0.088036 |
| 236 | United States Minor Outlying Islands | 0.061749 |
| 237 | Uruguay | 0.025120 |
| 238 | United States of America | 0.098839 |
| 239 | US Naval Base Guantanamo Bay | 0.013662 |
| 240 | Uzbekistan | 0.090956 |
| 241 | Vatican | 0.024224 |
| 242 | Saint Vincent and the Grenadines | 0.798123 |
| 243 | Venezuela | 0.164420 |
| 244 | British Virgin Islands | 0.070114 |
| 245 | United States Virgin Islands | 0.022017 |
| 246 | Vietnam | 0.051927 |
| 247 | Vanuatu | 0.071065 |
| 248 | Wallis and Futuna | 0.026419 |
| 249 | Akrotiri Sovereign Base Area | 0.003673 |
| 250 | Samoa | 0.153828 |
| 251 | Yemen | 0.044515 |
| 252 | South Africa | 0.435813 |
| 253 | Zambia | 0.051639 |
| 254 | Zimbabwe | 0.044964 |
[ ]: