Welcome to Malaya’s documentation!
Contents
Welcome to Malaya’s documentation!#
Malaya is a Natural-Language-Toolkit library for bahasa Malaysia, powered by PyTorch.
Documentation#
Proper documentation is available at https://malaya.readthedocs.io/
Installing from the PyPI#
$ pip install malaya
It will automatically install all dependencies except for PyTorch. So you can choose your own PyTorch CPU / GPU version.
Only Python >= 3.6.0, and PyTorch >= 1.10 are supported.
If you are a Windows user, make sure read https://malaya.readthedocs.io/en/latest/running-on-windows.html
Development Release#
Install from master branch,
$ pip install git+https://github.com/huseinzol05/malaya.git
We recommend to use virtualenv for development.
Documentation at https://malaya.readthedocs.io/en/latest/
Pretrained Models#
Malaya also released Malaysian pretrained models, simply check at https://huggingface.co/mesolitica
References#
If you use our software for research, please cite:
@misc{Malaya, Natural-Language-Toolkit library for bahasa Malaysia, powered by PyTorch,
author = {Husein, Zolkepli},
title = {Malaya},
year = {2018},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/mesolitica/malaya}}
}
Acknowledgement#
Thanks to,
KeyReply for private V100s cloud.
Nvidia for Azure credit.
Tensorflow Research Cloud for free TPUs access.
Contributing#
Thank you for contributing this library, really helps a lot. Feel free to contact me to suggest me anything or want to contribute other kind of forms, we accept everything, not just code!
Contents:#
- Speech Toolkit
- Installation
- Dataset
- Running on Windows
- Contributing
- API
- malaya
- malaya.augmentation.abstractive
- malaya.augmentation.rules
- malaya.dictionary
- malaya.generator.isi_penting
- malaya.keyword.abstractive
- malaya.keyword.extractive
- malaya.normalizer.rules
- malaya.qa.extractive
- malaya.similarity.doc2vec
- malaya.similarity.semantic
- malaya.spelling_correction.jamspell
- malaya.spelling_correction.probability
- malaya.spelling_correction.spylls
- malaya.spelling_correction.symspell
- malaya.summarization.abstractive
- malaya.summarization.extractive
- malaya.topic_model.decomposition
- malaya.topic_model.transformer
- malaya.zero_shot.classification
- malaya.cluster
- malaya.constituency
- malaya.dependency
- malaya.embedding
- malaya.emotion
- malaya.entity
- malaya.jawi
- malaya.knowledge_graph
- malaya.language_detection
- malaya.language_model
- malaya.llm
- malaya.nsfw
- malaya.num2word
- malaya.paraphrase
- malaya.pos
- malaya.preprocessing
- malaya.segmentation
- malaya.sentiment
- malaya.stack
- malaya.stem
- malaya.syllable
- malaya.tatabahasa
- malaya.tokenizer
- malaya.transformer
- malaya.translation
- malaya.true_case
- malaya.word2num
- malaya.wordvector
- malaya.model.extractive_summarization
- malaya.model.ml
- malaya.model.rules
- malaya.torch_model.gpt2_lm
- malaya.torch_model.huggingface
- malaya.torch_model.mask_lm
- Preprocessing
- Demoji
- Stemmer and Lemmatization
- True Case
- Segmentation
- Num2Word
- Word2Num
- Rules based Normalizer
- Load normalizer
- Use translator
- Use segmenter
- Use stemmer
- Validate uppercase
- Validate non human word
- Skip spelling correction
- Pass kwargs preprocessing
- Normalize text
- Normalize url
- Normalize email
- Normalize year
- Normalize telephone
- Normalize date
- Normalize time
- Normalize emoji
- Normalize elongated
- Normalize hingga
- Normalize pada hari bulan
- Normalize fraction
- Normalize money
- Normalize units
- Normalize percents
- Normalize IC
- Normalize Numbers
- Normalize x kali
- Normalize Cardinals
- Normalize Ordinals
- Normalize entity