Speech Toolkit#

logo

Pypi version Python3 version MIT License total stats download stats / month discord


Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Tensorflow and PyTorch.

Documentation#

Stable released documentation is available at https://malaya-speech.readthedocs.io/en/stable/

Installing from the PyPI#

$ pip install malaya-speech

It will automatically install all dependencies except for Tensorflow and PyTorch. So you can choose your own Tensorflow CPU / GPU version and PyTorch CPU / GPU version.

Only Python >= 3.6.0, Tensorflow >= 1.15.0, and PyTorch >= 1.10 are supported.

Development Release#

Install from master branch,

$ pip install git+https://github.com/huseinzol05/malaya-speech.git

We recommend to use virtualenv for development.

While development released documentation is available at https://malaya-speech.readthedocs.io/en/latest/

Features#

  • Age Detection, detect age in speech using Finetuned Speaker Vector.

  • Speaker Diarization, diarizing speakers using Pretrained Speaker Vector.

  • Emotion Detection, detect emotions in speech using Finetuned Speaker Vector.

  • Force Alignment, generate a time-aligned transcription of an audio file using RNNT, Wav2Vec2 CTC and Whisper Seq2Seq.

  • Gender Detection, detect genders in speech using Finetuned Speaker Vector.

  • Clean speech Detection, detect clean speech using Finetuned Speaker Vector.

  • Language Detection, detect hyperlocal languages in speech using Finetuned Speaker Vector.

  • Language Model, using KenLM, Masked language model using BERT and RoBERTa, and GPT2 to do ASR decoder scoring.

  • Multispeaker Separation, Multispeaker separation using FastSep on 8k Wav.

  • Noise Reduction, reduce multilevel noises using STFT UNET.

  • Speaker Change, detect changing speakers using Finetuned Speaker Vector.

  • Speaker overlap, detect overlap speakers using Finetuned Speaker Vector.

  • Speaker Vector, calculate similarity between speakers using Pretrained Speaker Vector.

  • Speech Enhancement, enhance voice activities using Waveform UNET.

  • SpeechSplit Conversion, detailed speaking style conversion by disentangling speech into content, timbre, rhythm and pitch using PyWorld and PySPTK.

  • Speech-to-Text, End-to-End Speech to Text for Malay, Mixed (Malay, Singlish) and Singlish using RNNT, Wav2Vec2 CTC and Whisper Seq2Seq.

  • Super Resolution, Super Resolution 4x for Waveform using ResNet UNET and Neural Vocoder.

  • Text-to-Speech, Text to Speech for Malay and Singlish using Tacotron2, FastSpeech2, FastPitch, GlowTTS, LightSpeech and VITS.

  • Vocoder, convert Mel to Waveform using MelGAN, Multiband MelGAN and Universal MelGAN Vocoder.

  • Voice Activity Detection, detect voice activities using Finetuned Speaker Vector.

  • Voice Conversion, Many-to-One and Zero-shot Voice Conversion.

  • Hybrid 8-bit Quantization, provide hybrid 8-bit quantization for all models to reduce inference time up to 2x and model size up to 4x.

  • Real time interface, provide PyAudio and TorchAudio streaming interface to do real time inference.

Pretrained Models#

Malaya-Speech also released pretrained models, simply check at malaya-speech/pretrained-model

References#

If you use our software for research, please cite:

@misc{Malaya, Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow,
  author = {Husein, Zolkepli},
  title = {Malaya-Speech},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huseinzol05/malaya-speech}}
}

Acknowledgement#

Thanks to KeyReply for private V100s cloud and Mesolitica for private RTXs cloud to train Malaya-Speech models,

logo logo