Running on Windows

UnicodeDecodeError: ‘charmap’ codec can’t decode byte

To solve this,

Windows Settings > Administrative language settings > Change system locale.

Checked Beta: Use Unicode UTF-8 for worldwide language support.

Restarted, everything works well.

Full dicussion check issue 25.

youtokentome failed to build

YouTokenToMe required cython and Microsoft Visual C++ 14.0 are required to compile and usually Windows users will break on this part, so we need to install Malaya without YouTokenToMe.

pip install malaya --no-deps
pip install tensorflow>=1.15

If we skipped YouTokenToMe, we not able to use,

  1. language-detection module, https://malaya.readthedocs.io/en/latest/load-language-detection.html

  2. True Case module, https://malaya.readthedocs.io/en/latest/load-true-case.html

  3. Multinomial model in emotion analysis, https://malaya.readthedocs.io/en/latest/load-emotion.html#Load-multinomial-model

  4. Multinomial model in sentiment analysis, https://malaya.readthedocs.io/en/latest/load-sentiment.html#Load-multinomial-model

  5. Multinomial model in subjectivity analysis, https://malaya.readthedocs.io/en/latest/load-subjectivity.html#Load-multinomial-model

  6. Multinomial model in toxicity analysis, https://malaya.readthedocs.io/en/latest/load-toxic.html#Load-multinomial-model

Or you still need these models, you need to install Cython,

pip install cython

And install Visual Studio from https://docs.microsoft.com/en-us/visualstudio/install/create-an-offline-installation-of-visual-studio?view=vs-2019, and choose Visual Studio 2019 Build Tools, vs_buildtools.exe.

And follow https://stackoverflow.com/questions/43847542/rc-exe-no-longer-found-in-vs-2015-command-prompt

Unable to use any T5 models

T5 depends on tensorflow-text, currently there is no official tensorflow-text binary released for Windows. So no T5 model for Windows users.

List T5 models,

  1. malaya.summarization.abstractive.transformer

  2. malaya.generator.transformer

  3. malaya.paraphrase.transformer