Running on Windows

UnicodeDecodeError: ‘charmap’ codec can’t decode byte

To solve this,

Windows Settings > Administrative language settings > Change system locale.

Checked Beta: Use Unicode UTF-8 for worldwide language support.

Restarted, everything works well.

Full dicussion check issue 25.

youtokentome failed to build

YouTokenToMe required cython to compile and usually Windows users will break on this part, so we need to install Malaya without YouTokenToMe.

pip install malaya --no-deps
pip install tensorflow==1.15
pip install bert-tensorflow albert-tensorflow

If we skipped YouTokenToMe, we not able to use,

  1. language-detection module, https://malaya.readthedocs.io/en/latest/load-language-detection.html

  2. True Case module, https://malaya.readthedocs.io/en/latest/load-true-case.html

Unable to use any T5 models

T5 depends on tensorflow-text, currently there is no official tensorflow-text binary released for Windows. So no T5 model for Windows users.

List T5 models,

  1. malaya.summarization.abstractive.t5

  2. malaya.generator.t5

  3. malaya.paraphrase.t5

Lack development on Windows

Malaya been tested on modern UNIX / LINUX systems that supported tensorflow binary. Windows support not our development path right now (we don’t have any Windows machine here).