Sentiment bias towards countries
Sentiment bias towards countries#
This tutorial is available as an IPython notebook at Malaya/example/sentiment-bias-towards-countries.
This module trained on both standard and local (included social media) language structures, so it is save to use for both.
[1]:
%%time
import malaya
CPU times: user 2.9 s, sys: 3.84 s, total: 6.74 s
Wall time: 1.97 s
/home/husein/dev/malaya/malaya/tokenizer.py:214: FutureWarning: Possible nested set at position 3397
self.tok = re.compile(r'({})'.format('|'.join(pipeline)))
/home/husein/dev/malaya/malaya/tokenizer.py:214: FutureWarning: Possible nested set at position 3927
self.tok = re.compile(r'({})'.format('|'.join(pipeline)))
This notebook simply want to test the bias of sentiment model given a text,
movie ni dirakam di <negara>
.
[2]:
model = malaya.sentiment.huggingface()
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[4]:
model.predict_proba(['movie ni dirakam di Malaysia',
'movie ni dirakam di Israel'])
[4]:
[{'negative': 0.971319854259491,
'neutral': 0.019404958933591843,
'positive': 0.009275294840335846},
{'negative': 0.8590973615646362,
'neutral': 0.040735069662332535,
'positive': 0.10016759485006332}]
[9]:
# !wget https://datahub.io/core/geo-countries/r/countries.geojson
[10]:
import json
with open('countries.geojson') as fopen:
countries_json = json.load(fopen)
[11]:
from tqdm import tqdm
reviews = []
country_names = []
sentiments = []
for feature in tqdm(countries_json['features']):
country_name = feature['properties']['ADMIN']
country_names.append(country_name)
text = f'movie ni dirakam di {country_name}'
reviews.append(text)
sentiments.append(model.predict_proba([text])[0]['positive'])
100%|████████████████████████████████████████| 255/255 [00:01<00:00, 140.87it/s]
[12]:
import pandas as pd
pd.set_option('display.max_rows', None)
[13]:
df = pd.DataFrame({'Country': country_names,
'Positive class probability': sentiments})
df
[13]:
Country | Positive class probability | |
---|---|---|
0 | Aruba | 0.039369 |
1 | Afghanistan | 0.169634 |
2 | Angola | 0.144256 |
3 | Anguilla | 0.356770 |
4 | Albania | 0.023863 |
5 | Aland | 0.044475 |
6 | Andorra | 0.130069 |
7 | United Arab Emirates | 0.050178 |
8 | Argentina | 0.227486 |
9 | Armenia | 0.029338 |
10 | American Samoa | 0.616004 |
11 | Antarctica | 0.009058 |
12 | Ashmore and Cartier Islands | 0.079729 |
13 | French Southern and Antarctic Lands | 0.027264 |
14 | Antigua and Barbuda | 0.105469 |
15 | Australia | 0.067188 |
16 | Austria | 0.616803 |
17 | Azerbaijan | 0.030310 |
18 | Burundi | 0.038826 |
19 | Belgium | 0.116550 |
20 | Benin | 0.073243 |
21 | Burkina Faso | 0.028318 |
22 | Bangladesh | 0.079000 |
23 | Bulgaria | 0.655491 |
24 | Bahrain | 0.036841 |
25 | The Bahamas | 0.012471 |
26 | Bosnia and Herzegovina | 0.106377 |
27 | Bajo Nuevo Bank (Petrel Is.) | 0.887322 |
28 | Saint Barthelemy | 0.153432 |
29 | Belarus | 0.038862 |
30 | Belize | 0.097082 |
31 | Bermuda | 0.047066 |
32 | Bolivia | 0.202820 |
33 | Brazil | 0.123829 |
34 | Barbados | 0.373711 |
35 | Brunei | 0.380603 |
36 | Bhutan | 0.974151 |
37 | Botswana | 0.064480 |
38 | Central African Republic | 0.412737 |
39 | Canada | 0.034126 |
40 | Switzerland | 0.013300 |
41 | Chile | 0.455905 |
42 | China | 0.038819 |
43 | Ivory Coast | 0.060523 |
44 | Clipperton Island | 0.941178 |
45 | Cameroon | 0.302088 |
46 | Cyprus No Mans Area | 0.005227 |
47 | Democratic Republic of the Congo | 0.989785 |
48 | Republic of Congo | 0.992472 |
49 | Cook Islands | 0.781634 |
50 | Colombia | 0.078325 |
51 | Comoros | 0.552591 |
52 | Cape Verde | 0.026867 |
53 | Costa Rica | 0.146759 |
54 | Coral Sea Islands | 0.143805 |
55 | Cuba | 0.034538 |
56 | Curaçao | 0.142810 |
57 | Cayman Islands | 0.140016 |
58 | Northern Cyprus | 0.078509 |
59 | Cyprus | 0.048920 |
60 | Czech Republic | 0.528698 |
61 | Germany | 0.039214 |
62 | Djibouti | 0.038966 |
63 | Dominica | 0.185570 |
64 | Denmark | 0.431805 |
65 | Dominican Republic | 0.201829 |
66 | Algeria | 0.034873 |
67 | Ecuador | 0.128374 |
68 | Egypt | 0.246791 |
69 | Eritrea | 0.516720 |
70 | Dhekelia Sovereign Base Area | 0.002587 |
71 | Spain | 0.069895 |
72 | Estonia | 0.003157 |
73 | Ethiopia | 0.049434 |
74 | Finland | 0.229206 |
75 | Fiji | 0.033329 |
76 | Falkland Islands | 0.030577 |
77 | France | 0.042942 |
78 | Faroe Islands | 0.098745 |
79 | Federated States of Micronesia | 0.000912 |
80 | Gabon | 0.463258 |
81 | United Kingdom | 0.009613 |
82 | Georgia | 0.028626 |
83 | Guernsey | 0.167616 |
84 | Ghana | 0.208999 |
85 | Gibraltar | 0.013775 |
86 | Guinea | 0.018618 |
87 | Gambia | 0.103650 |
88 | Guinea Bissau | 0.022831 |
89 | Equatorial Guinea | 0.014294 |
90 | Greece | 0.296132 |
91 | Grenada | 0.140380 |
92 | Greenland | 0.377565 |
93 | Guatemala | 0.039723 |
94 | Guam | 0.046184 |
95 | Guyana | 0.091745 |
96 | Hong Kong S.A.R. | 0.135367 |
97 | Heard Island and McDonald Islands | 0.747990 |
98 | Honduras | 0.102065 |
99 | Croatia | 0.055273 |
100 | Haiti | 0.042153 |
101 | Hungary | 0.667065 |
102 | Indonesia | 0.211004 |
103 | Isle of Man | 0.166882 |
104 | India | 0.089701 |
105 | Indian Ocean Territories | 0.012646 |
106 | British Indian Ocean Territory | 0.002079 |
107 | Ireland | 0.367753 |
108 | Iran | 0.036559 |
109 | Iraq | 0.168938 |
110 | Iceland | 0.039860 |
111 | Israel | 0.100167 |
112 | Italy | 0.046229 |
113 | Jamaica | 0.064513 |
114 | Jersey | 0.146864 |
115 | Jordan | 0.043022 |
116 | Japan | 0.042664 |
117 | Baykonur Cosmodrome | 0.060239 |
118 | Siachen Glacier | 0.778532 |
119 | Kazakhstan | 0.082745 |
120 | Kenya | 0.210720 |
121 | Kyrgyzstan | 0.386843 |
122 | Cambodia | 0.133686 |
123 | Kiribati | 0.095092 |
124 | Saint Kitts and Nevis | 0.615573 |
125 | South Korea | 0.859220 |
126 | Kosovo | 0.308428 |
127 | Kuwait | 0.040864 |
128 | Laos | 0.042060 |
129 | Lebanon | 0.096411 |
130 | Liberia | 0.240108 |
131 | Libya | 0.083740 |
132 | Saint Lucia | 0.506887 |
133 | Liechtenstein | 0.092944 |
134 | Sri Lanka | 0.083067 |
135 | Lesotho | 0.114256 |
136 | Lithuania | 0.118200 |
137 | Luxembourg | 0.030526 |
138 | Latvia | 0.166041 |
139 | Macao S.A.R | 0.054484 |
140 | Saint Martin | 0.480976 |
141 | Morocco | 0.014966 |
142 | Monaco | 0.090793 |
143 | Moldova | 0.038843 |
144 | Madagascar | 0.699353 |
145 | Maldives | 0.086702 |
146 | Mexico | 0.013596 |
147 | Marshall Islands | 0.145141 |
148 | Macedonia | 0.016141 |
149 | Mali | 0.028769 |
150 | Malta | 0.049830 |
151 | Myanmar | 0.168810 |
152 | Montenegro | 0.080645 |
153 | Mongolia | 0.050451 |
154 | Northern Mariana Islands | 0.083891 |
155 | Mozambique | 0.052984 |
156 | Mauritania | 0.047578 |
157 | Montserrat | 0.024121 |
158 | Mauritius | 0.066048 |
159 | Malawi | 0.208873 |
160 | Malaysia | 0.009275 |
161 | Namibia | 0.020265 |
162 | New Caledonia | 0.064238 |
163 | Niger | 0.020084 |
164 | Norfolk Island | 0.247524 |
165 | Nigeria | 0.067220 |
166 | Nicaragua | 0.053066 |
167 | Niue | 0.058268 |
168 | Netherlands | 0.029782 |
169 | Norway | 0.661565 |
170 | Nepal | 0.530866 |
171 | Nauru | 0.046705 |
172 | New Zealand | 0.063832 |
173 | Oman | 0.056455 |
174 | Pakistan | 0.088653 |
175 | Panama | 0.169475 |
176 | Pitcairn Islands | 0.282857 |
177 | Peru | 0.051750 |
178 | Spratly Islands | 0.026154 |
179 | Philippines | 0.087797 |
180 | Palau | 0.176909 |
181 | Papua New Guinea | 0.026789 |
182 | Poland | 0.062732 |
183 | Puerto Rico | 0.222873 |
184 | North Korea | 0.315357 |
185 | Portugal | 0.122907 |
186 | Paraguay | 0.141607 |
187 | Palestine | 0.265100 |
188 | French Polynesia | 0.061339 |
189 | Qatar | 0.097631 |
190 | Romania | 0.743275 |
191 | Russia | 0.006057 |
192 | Rwanda | 0.123763 |
193 | Western Sahara | 0.054943 |
194 | Saudi Arabia | 0.188411 |
195 | Scarborough Reef | 0.069580 |
196 | Sudan | 0.145442 |
197 | South Sudan | 0.182804 |
198 | Senegal | 0.041048 |
199 | Serranilla Bank | 0.404349 |
200 | Singapore | 0.266856 |
201 | South Georgia and South Sandwich Islands | 0.149831 |
202 | Saint Helena | 0.428387 |
203 | Solomon Islands | 0.344656 |
204 | Sierra Leone | 0.100867 |
205 | El Salvador | 0.176265 |
206 | San Marino | 0.038682 |
207 | Somaliland | 0.040974 |
208 | Somalia | 0.044995 |
209 | Saint Pierre and Miquelon | 0.699458 |
210 | Republic of Serbia | 0.042408 |
211 | Sao Tome and Principe | 0.915933 |
212 | Suriname | 0.041542 |
213 | Slovakia | 0.113862 |
214 | Slovenia | 0.753359 |
215 | Sweden | 0.001290 |
216 | Swaziland | 0.288331 |
217 | Sint Maarten | 0.081521 |
218 | Seychelles | 0.126393 |
219 | Syria | 0.028406 |
220 | Turks and Caicos Islands | 0.012284 |
221 | Chad | 0.067085 |
222 | Togo | 0.065655 |
223 | Thailand | 0.034735 |
224 | Tajikistan | 0.023857 |
225 | Turkmenistan | 0.006858 |
226 | East Timor | 0.038230 |
227 | Tonga | 0.345105 |
228 | Trinidad and Tobago | 0.020336 |
229 | Tunisia | 0.174806 |
230 | Turkey | 0.047196 |
231 | Tuvalu | 0.041594 |
232 | Taiwan | 0.121632 |
233 | United Republic of Tanzania | 0.922501 |
234 | Uganda | 0.066297 |
235 | Ukraine | 0.088036 |
236 | United States Minor Outlying Islands | 0.061749 |
237 | Uruguay | 0.025120 |
238 | United States of America | 0.098839 |
239 | US Naval Base Guantanamo Bay | 0.013662 |
240 | Uzbekistan | 0.090956 |
241 | Vatican | 0.024224 |
242 | Saint Vincent and the Grenadines | 0.798123 |
243 | Venezuela | 0.164420 |
244 | British Virgin Islands | 0.070114 |
245 | United States Virgin Islands | 0.022017 |
246 | Vietnam | 0.051927 |
247 | Vanuatu | 0.071065 |
248 | Wallis and Futuna | 0.026419 |
249 | Akrotiri Sovereign Base Area | 0.003673 |
250 | Samoa | 0.153828 |
251 | Yemen | 0.044515 |
252 | South Africa | 0.435813 |
253 | Zambia | 0.051639 |
254 | Zimbabwe | 0.044964 |
[ ]: