Sentiment bias towards countries#

This tutorial is available as an IPython notebook at Malaya/example/sentiment-bias-towards-countries.

This module trained on both standard and local (included social media) language structures, so it is save to use for both.

[1]:
%%time
import malaya
CPU times: user 5.82 s, sys: 1.13 s, total: 6.96 s
Wall time: 8.81 s

This notebook simply want to test the bias of sentiment model given a text,

movie ni dirakam di <negara>.

[2]:
model = malaya.sentiment.transformer(model = 'bert')
[3]:
model.predict_proba(['movie ni dirakam di Malaysia',
                    'movie ni dirakam di Israel'])
[3]:
[{'negative': 0.93227524, 'neutral': 0.065377586, 'positive': 0.0023471666},
 {'negative': 0.102990896, 'neutral': 0.8959816, 'positive': 0.0010273907}]
[4]:
# !wget https://datahub.io/core/geo-countries/r/countries.geojson
[5]:
import json

with open('countries.geojson') as fopen:
    countries_json = json.load(fopen)
[6]:
from tqdm import tqdm

reviews = []
country_names = []
sentiments = []
for feature in tqdm(countries_json['features']):
    country_name = feature['properties']['ADMIN']
    country_names.append(country_name)
    text = f'movie ni dirakam di {country_name}'
    reviews.append(text)
    sentiments.append(model.predict_proba([text])[0]['positive'])
100%|██████████| 255/255 [00:09<00:00, 26.69it/s]
[7]:
import pandas as pd
pd.set_option('display.max_rows', None)
[8]:
df = pd.DataFrame({'Country': country_names,
                   'Positive class probability': sentiments})
df
[8]:
Country Positive class probability
0 Aruba 0.995413
1 Afghanistan 0.000171
2 Angola 0.134541
3 Anguilla 0.000227
4 Albania 0.000180
5 Aland 0.000204
6 Andorra 0.002523
7 United Arab Emirates 0.000249
8 Argentina 0.000221
9 Armenia 0.001122
10 American Samoa 0.000478
11 Antarctica 0.000249
12 Ashmore and Cartier Islands 0.000257
13 French Southern and Antarctic Lands 0.000138
14 Antigua and Barbuda 0.000906
15 Australia 0.005181
16 Austria 0.000779
17 Azerbaijan 0.001744
18 Burundi 0.000153
19 Belgium 0.000285
20 Benin 0.001136
21 Burkina Faso 0.000150
22 Bangladesh 0.893460
23 Bulgaria 0.000258
24 Bahrain 0.000517
25 The Bahamas 0.998277
26 Bosnia and Herzegovina 0.000184
27 Bajo Nuevo Bank (Petrel Is.) 0.032654
28 Saint Barthelemy 0.000237
29 Belarus 0.004946
30 Belize 0.001728
31 Bermuda 0.001828
32 Bolivia 0.000233
33 Brazil 0.000602
34 Barbados 0.000476
35 Brunei 0.000243
36 Bhutan 0.000278
37 Botswana 0.000166
38 Central African Republic 0.002198
39 Canada 0.000410
40 Switzerland 0.001202
41 Chile 0.000248
42 China 0.000485
43 Ivory Coast 0.000313
44 Clipperton Island 0.000222
45 Cameroon 0.000175
46 Cyprus No Mans Area 0.000267
47 Democratic Republic of the Congo 0.014106
48 Republic of Congo 0.000772
49 Cook Islands 0.000163
50 Colombia 0.000741
51 Comoros 0.000239
52 Cape Verde 0.000471
53 Costa Rica 0.000957
54 Coral Sea Islands 0.000227
55 Cuba 0.000717
56 Curaçao 0.001440
57 Cayman Islands 0.000157
58 Northern Cyprus 0.000484
59 Cyprus 0.000182
60 Czech Republic 0.002262
61 Germany 0.005834
62 Djibouti 0.000378
63 Dominica 0.000224
64 Denmark 0.002279
65 Dominican Republic 0.000451
66 Algeria 0.032223
67 Ecuador 0.001871
68 Egypt 0.000196
69 Eritrea 0.001567
70 Dhekelia Sovereign Base Area 0.001982
71 Spain 0.015976
72 Estonia 0.006166
73 Ethiopia 0.009114
74 Finland 0.021378
75 Fiji 0.000253
76 Falkland Islands 0.000181
77 France 0.000215
78 Faroe Islands 0.000181
79 Federated States of Micronesia 0.000206
80 Gabon 0.000200
81 United Kingdom 0.000342
82 Georgia 0.000936
83 Guernsey 0.000233
84 Ghana 0.000430
85 Gibraltar 0.008723
86 Guinea 0.000199
87 Gambia 0.002079
88 Guinea Bissau 0.001238
89 Equatorial Guinea 0.000151
90 Greece 0.001017
91 Grenada 0.008601
92 Greenland 0.000394
93 Guatemala 0.002265
94 Guam 0.000271
95 Guyana 0.240019
96 Hong Kong S.A.R. 0.001029
97 Heard Island and McDonald Islands 0.000240
98 Honduras 0.003471
99 Croatia 0.997073
100 Haiti 0.000359
101 Hungary 0.000228
102 Indonesia 0.000509
103 Isle of Man 0.000336
104 India 0.000336
105 Indian Ocean Territories 0.000205
106 British Indian Ocean Territory 0.000380
107 Ireland 0.000365
108 Iran 0.000207
109 Iraq 0.001238
110 Iceland 0.000208
111 Israel 0.001309
112 Italy 0.002297
113 Jamaica 0.000581
114 Jersey 0.000453
115 Jordan 0.000675
116 Japan 0.000188
117 Baykonur Cosmodrome 0.002028
118 Siachen Glacier 0.999654
119 Kazakhstan 0.002767
120 Kenya 0.998555
121 Kyrgyzstan 0.000558
122 Cambodia 0.000335
123 Kiribati 0.000147
124 Saint Kitts and Nevis 0.001007
125 South Korea 0.000307
126 Kosovo 0.000210
127 Kuwait 0.002722
128 Laos 0.000303
129 Lebanon 0.006163
130 Liberia 0.000234
131 Libya 0.000729
132 Saint Lucia 0.000949
133 Liechtenstein 0.000910
134 Sri Lanka 0.000344
135 Lesotho 0.000157
136 Lithuania 0.000993
137 Luxembourg 0.003783
138 Latvia 0.000150
139 Macao S.A.R 0.013588
140 Saint Martin 0.000216
141 Morocco 0.000293
142 Monaco 0.000389
143 Moldova 0.000255
144 Madagascar 0.000215
145 Maldives 0.000296
146 Mexico 0.331166
147 Marshall Islands 0.000286
148 Macedonia 0.042203
149 Mali 0.003839
150 Malta 0.000447
151 Myanmar 0.000164
152 Montenegro 0.000221
153 Mongolia 0.000193
154 Northern Mariana Islands 0.953534
155 Mozambique 0.000162
156 Mauritania 0.000555
157 Montserrat 0.012718
158 Mauritius 0.001228
159 Malawi 0.000228
160 Malaysia 0.015719
161 Namibia 0.001958
162 New Caledonia 0.000183
163 Niger 0.006889
164 Norfolk Island 0.000150
165 Nigeria 0.018576
166 Nicaragua 0.176616
167 Niue 0.010862
168 Netherlands 0.000164
169 Norway 0.004876
170 Nepal 0.807934
171 Nauru 0.000334
172 New Zealand 0.000321
173 Oman 0.000481
174 Pakistan 0.002370
175 Panama 0.000214
176 Pitcairn Islands 0.000147
177 Peru 0.000264
178 Spratly Islands 0.000422
179 Philippines 0.168676
180 Palau 0.000381
181 Papua New Guinea 0.000210
182 Poland 0.858655
183 Puerto Rico 0.006299
184 North Korea 0.000539
185 Portugal 0.000170
186 Paraguay 0.010558
187 Palestine 0.003391
188 French Polynesia 0.000164
189 Qatar 0.006548
190 Romania 0.000564
191 Russia 0.010689
192 Rwanda 0.000180
193 Western Sahara 0.000179
194 Saudi Arabia 0.000305
195 Scarborough Reef 0.000733
196 Sudan 0.000601
197 South Sudan 0.001003
198 Senegal 0.001575
199 Serranilla Bank 0.992685
200 Singapore 0.008278
201 South Georgia and South Sandwich Islands 0.000231
202 Saint Helena 0.002312
203 Solomon Islands 0.000182
204 Sierra Leone 0.000791
205 El Salvador 0.001595
206 San Marino 0.000199
207 Somaliland 0.003437
208 Somalia 0.001969
209 Saint Pierre and Miquelon 0.001498
210 Republic of Serbia 0.014553
211 Sao Tome and Principe 0.000203
212 Suriname 0.000176
213 Slovakia 0.047208
214 Slovenia 0.003752
215 Sweden 0.000164
216 Swaziland 0.003444
217 Sint Maarten 0.000211
218 Seychelles 0.009713
219 Syria 0.000365
220 Turks and Caicos Islands 0.002336
221 Chad 0.000183
222 Togo 0.000183
223 Thailand 0.000410
224 Tajikistan 0.000170
225 Turkmenistan 0.004160
226 East Timor 0.001985
227 Tonga 0.000724
228 Trinidad and Tobago 0.004845
229 Tunisia 0.014784
230 Turkey 0.012013
231 Tuvalu 0.000162
232 Taiwan 0.000196
233 United Republic of Tanzania 0.000232
234 Uganda 0.003200
235 Ukraine 0.002089
236 United States Minor Outlying Islands 0.000142
237 Uruguay 0.000200
238 United States of America 0.000243
239 US Naval Base Guantanamo Bay 0.000233
240 Uzbekistan 0.003307
241 Vatican 0.001149
242 Saint Vincent and the Grenadines 0.021441
243 Venezuela 0.000154
244 British Virgin Islands 0.000146
245 United States Virgin Islands 0.000133
246 Vietnam 0.000160
247 Vanuatu 0.000197
248 Wallis and Futuna 0.000189
249 Akrotiri Sovereign Base Area 0.003927
250 Samoa 0.000163
251 Yemen 0.002864
252 South Africa 0.001550
253 Zambia 0.000200
254 Zimbabwe 0.000662