{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Running on Windows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### UnicodeDecodeError: 'charmap' codec can't decode byte\n", "\n", "To solve this,\n", "\n", "Windows Settings > Administrative language settings > Change system locale.\n", "\n", "Checked Beta: Use Unicode UTF-8 for worldwide language support.\n", "\n", "Restarted, everything works well.\n", "\n", "Full dicussion check [issue 25](https://github.com/huseinzol05/Malaya/issues/25)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### YouTokenToMe failed to build\n", "\n", "[YouTokenToMe](https://github.com/VKCOM/YouTokenToMe) required cython and Microsoft Visual C++ 14.0 are required to compile and usually Windows users will break on this part, so we need to install Malaya without [YouTokenToMe](https://github.com/VKCOM/YouTokenToMe).\n", "\n", "```bash\n", "pip install malaya --no-deps\n", "pip install tensorflow>=1.15\n", "```\n", "\n", "If we skipped [YouTokenToMe](https://github.com/VKCOM/YouTokenToMe), we not able to use,\n", "\n", "1. True Case Tensorflow deep learning models.\n", "2. Multinomial model in emotion analysis.\n", "3. Multinomial model in sentiment analysis.\n", "4. Multinomial model in subjectivity analysis.\n", "5. Multinomial model in toxicity analysis.\n", "6. Jawi-to-Rumi Tensorflow deep learning models.\n", "7. Rumi-to-Jawi Tensorflow deep learning models.\n", "8. Syllable Tensorflow deep learning models.\n", "9. Language detection Tensorflow deep learning models.\n", "\n", "Or you still need these models, you need to install Cython,\n", "\n", "```bash\n", "pip install cython\n", "```\n", "\n", "And install Visual Studio from https://docs.microsoft.com/en-us/visualstudio/install/create-an-offline-installation-of-visual-studio?view=vs-2019, and choose Visual Studio 2019 Build Tools, [vs_buildtools.exe](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=buildtools&rel=16&utm_medium=microsoft&utm_source=docs.microsoft.com&utm_campaign=offline+install&utm_content=download+vs2019).\n", "\n", "And follow https://stackoverflow.com/questions/43847542/rc-exe-no-longer-found-in-vs-2015-command-prompt\n", "\n", "Read more from https://github.com/VKCOM/YouTokenToMe/issues/96, `Visual Studio 2022 with only Python Build Tools` is enough.\n", "\n", "**For now, I do not have a plan to migrate YTTM to another tokenizer library**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Unable to use any T5 Tensorflow models\n", "\n", "T5 depends on tensorflow-text, currently there is no official tensorflow-text binary released for Windows. So no T5 model for Windows users.\n", "\n", "But you can use T5 HuggingFace models, for an example, `malaya.dependency.huggingface()`." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 2 }