Contents

GPT2 LM

Contents

GPT2 LM#

This tutorial is available as an IPython notebook at Malaya/example/gpt2-lm.

[1]:

import os

os.environ['CUDA_VISIBLE_DEVICES'] = ''

[2]:

import malaya

/home/husein/dev/malaya/malaya/tokenizer.py:202: FutureWarning: Possible nested set at position 3361
  self.tok = re.compile(r'({})'.format('|'.join(pipeline)))
/home/husein/dev/malaya/malaya/tokenizer.py:202: FutureWarning: Possible nested set at position 3879
  self.tok = re.compile(r'({})'.format('|'.join(pipeline)))

Dependency#

Make sure you already installed,

pip3 install transformers

List available GPT2 models#

[3]:

malaya.language_model.available_gpt2()

[3]:

	Size (MB)
mesolitica/gpt2-117m-bahasa-cased	454

Load GPT2 LM model#

def gpt2(model: str = 'mesolitica/gpt2-117m-bahasa-cased', force_check: bool = True, **kwargs):
    """
    Load GPT2 language model.

    Parameters
    ----------
    model: str, optional (default='mesolitica/gpt2-117m-bahasa-cased')
        Check available models at `malaya.language_model.available_gpt2()`.
    force_check: bool, optional (default=True)
        Force check model one of malaya model.
        Set to False if you have your own huggingface model.

    Returns
    -------
    result: malaya.torch_model.gpt2_lm.LM class
    """

If you have other models from huggingface and want to load it on malaya.torch_model.gpt2_lm.LM, set force_check=False.

[9]:

model = malaya.language_model.gpt2()

[5]:

model.score('saya suke awak')

[5]:

-51.384037494659424

[6]:

model.score('saya suka awak')

[6]:

-46.20505475997925

[7]:

model.score('najib razak')

[7]:

-48.355817794799805

[8]:

model.score('najib comel')

[8]:

-52.79337692260742