Speech Toolkit

logo

Pypi version Python3 version MIT License total stats download stats / month


Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow. We maintain it at separate repository, https://github.com/huseinzol05/malaya-speech

Documentation

Proper documentation is available at https://malaya-speech.readthedocs.io/

Installing from the PyPI

CPU version

$ pip install malaya-speech

GPU version

$ pip install malaya-speech[gpu]

Only Python 3.6.0 and above and Tensorflow 1.15.0 and above are supported.

We recommend to use virtualenv for development. All examples tested on Tensorflow version 1.15.4 and 2.4.1.

Features

  • Age Detection, detect age in speech using Finetuned Speaker Vector.

  • Speaker Diarization, diarizing speakers using Pretrained Speaker Vector.

  • Emotion Detection, detect emotions in speech using Finetuned Speaker Vector.

  • Gender Detection, detect genders in speech using Finetuned Speaker Vector.

  • Language Detection, detect hyperlocal languages in speech using Finetuned Speaker Vector.

  • Multispeaker Separation, Multispeaker separation using FastSep on 8k Wav.

  • Noise Reduction, reduce multilevel noises using STFT UNET.

  • Speaker Change, detect changing speakers using Finetuned Speaker Vector.

  • Speaker overlap, detect overlap speakers using Finetuned Speaker Vector.

  • Speaker Vector, calculate similarity between speakers using Pretrained Speaker Vector.

  • Speech Enhancement, enhance voice activities using Waveform UNET.

  • SpeechSplit Conversion, detailed speaking style conversion by disentangling speech into content, timbre, rhythm and pitch using PyWorld and PySPTK.

  • Speech-to-Text, End-to-End Speech to Text for Malay and Mixed (Malay and Singlish) using RNN-Transducer and Wav2Vec2 CTC.

  • Super Resolution, Super Resolution 4x for Waveform.

  • Text-to-Speech, Text to Speech for Malay and Singlish using Tacotron2 and FastSpeech2.

  • Vocoder, convert Mel to Waveform using MelGAN, Multiband MelGAN and Universal MelGAN Vocoder.

  • Voice Activity Detection, detect voice activities using Finetuned Speaker Vector.

  • Voice Conversion, Many-to-One, One-to-Many, Many-to-Many, and Zero-shot Voice Conversion.

  • Hybrid 8-bit Quantization, provide hybrid 8-bit quantization for all models to reduce inference time up to 2x and model size up to 4x.

Pretrained Models

Malaya-Speech also released pretrained models, simply check at malaya-speech/pretrained-model

References

If you use our software for research, please cite:

@misc{Malaya, Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow,
  author = {Husein, Zolkepli},
  title = {Malaya-Speech},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huseinzol05/malaya-speech}}
}

Acknowledgement

Thanks to KeyReply for sponsoring private cloud to train Malaya-Speech models, without it, this library will collapse entirely.

logo