GPT2 LM
Contents
GPT2 LM#
This tutorial is available as an IPython notebook at Malaya/example/gpt2-lm.
[1]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = ''
[2]:
import malaya
/home/husein/dev/malaya/malaya/tokenizer.py:202: FutureWarning: Possible nested set at position 3361
self.tok = re.compile(r'({})'.format('|'.join(pipeline)))
/home/husein/dev/malaya/malaya/tokenizer.py:202: FutureWarning: Possible nested set at position 3879
self.tok = re.compile(r'({})'.format('|'.join(pipeline)))
List available GPT2 models#
[3]:
malaya.language_model.available_gpt2()
[3]:
Size (MB) | |
---|---|
mesolitica/gpt2-117m-bahasa-cased | 454 |
Load GPT2 LM model#
def gpt2(model: str = 'mesolitica/gpt2-117m-bahasa-cased', force_check: bool = True, **kwargs):
"""
Load GPT2 language model.
Parameters
----------
model: str, optional (default='mesolitica/gpt2-117m-bahasa-cased')
Check available models at `malaya.language_model.available_gpt2()`.
force_check: bool, optional (default=True)
Force check model one of malaya model.
Set to False if you have your own huggingface model.
Returns
-------
result: malaya.torch_model.gpt2_lm.LM class
"""
If you have other models from huggingface and want to load it on malaya.torch_model.gpt2_lm.LM
, set force_check=False
.
[9]:
model = malaya.language_model.gpt2()
[5]:
model.score('saya suke awak')
[5]:
-51.384037494659424
[6]:
model.score('saya suka awak')
[6]:
-46.20505475997925
[7]:
model.score('najib razak')
[7]:
-48.355817794799805
[8]:
model.score('najib comel')
[8]:
-52.79337692260742